├── .gitignore ├── LICENSE ├── README.md ├── convert_darknet.py ├── data ├── coco.py ├── config.py ├── data_augment.py ├── voc0712.py └── voc_eval.py ├── demo.py ├── eval.py ├── images ├── dog.jpg ├── eagle.jpg └── person.jpg ├── layers ├── multiyolo_loss.py ├── weight_mseloss.py ├── yolo_layer.py └── yolo_loss.py ├── make.sh ├── model ├── __init__.py ├── darknet53.py └── yolo.py ├── output ├── output_dog.jpg ├── output_eagle.jpg └── output_person.jpg ├── train.py └── utils ├── box_utils.py ├── build.py ├── gen_anchors.py ├── nms ├── __init__.py ├── cpu_nms.pyx ├── gpu_nms.hpp ├── gpu_nms.pyx ├── nms_kernel.cu └── py_cpu_nms.py ├── nms_wrapper.py ├── preprocess.py ├── pycocotools ├── __init__.py ├── _mask.pyx ├── coco.py ├── cocoeval.py ├── mask.py ├── maskApi.c └── maskApi.h └── timer.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | env/ 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | 27 | *log 28 | *.json 29 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Max deGroot, Ellis Brown 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## YOLO v3 implementation With pytorch 2 | > this repository only contain the detection module and we don't need the cfg from original YOLOv3, we implement it with pytorch. 3 | 4 | This repository is based on the official code of [YOLOv3](https://github.com/pjreddie/darknet) and [pytorch-yolo-v3](https://github.com/ayooshkathuria/pytorch-yolo-v3). There's also an implementation for YOLOv3 already for pytorch, but it uses a config file rather than a normal pytorch approch to defining the network. One of the goals of this repository is to remove the cfg file. 5 | 6 | ## Requirements 7 | 8 | * Python 3.5 9 | * OpenCV 10 | * PyTorch 0.4 11 | 12 | ## Installation 13 | 14 | * Install PyTorch-0.4.0 by selecting your environment on the website and running the appropriate command. 15 | * Clone this repository 16 | * Compile the nms 17 | * convert yolov3.weights to pytorch 18 | 19 | ```shell 20 | cd YOLOv3_Pytorch 21 | ./make.sh 22 | 23 | mkdir weights 24 | cd weights 25 | wget https://pjreddie.com/media/files/yolov3.weights 26 | cd .. 27 | python convert_darknet.py --version coco --weights ./weights/yolov3.weights --save_name ./weights/convert_yolov3_coco.pth 28 | # we will get the convert_yolov3_coco.pth 29 | ``` 30 | 31 | ## Train 32 | > We only train voc dataset because we don't have enough gpus to train coco datatset. This is still an experimental repository, we don't reproduce the original results very well. 33 | 34 | ### dataset 35 | [merge VOC dataset](https://github.com/yqyao/DRFNet#voc-dataset) 36 | 37 | * structure 38 | 39 | ./data/datasets/VOCdevkit0712/VOC0712/Annotations 40 | ./data/datasets/VOCdevkit0712/VOC0712/ImageSets 41 | ./data/datasets/VOCdevkit0712/VOC0712/JPEGImages 42 | 43 | * COCO 44 | 45 | Same with [COCO](https://github.com/yqyao/DRFNet#coco-dataset) 46 | 47 | ### train 48 | > you can train multiscale by changing data/config voc_config multiscale 49 | 50 | * convert weights 51 | ```shell 52 | cd weights 53 | wget wget https://pjreddie.com/media/files/darknet53.conv.74 54 | cd ../ 55 | python convert_darknet.py --version darknet53 --weights ./weights/darknet53.conv.74 --save_name ./weights/convert_darknet53.pth 56 | ``` 57 | 58 | * train yolov3 59 | 60 | ```python 61 | python train.py --input_wh 416 416 -b 64 --subdivisions 4 -d VOC --basenet ./weights/convert_darknet53.pth 62 | 63 | ``` 64 | 65 | ### eval 66 | 67 | ```python 68 | 69 | python eval.py --weights ./weights/convert_yolov3_voc.pth --dataset VOC --input_wh 416 416 70 | ``` 71 | > darknet voc is trained by darknet, pytorch voc is trained by this repository 72 | 73 | **results** 74 | 75 | | darknet voc 608 | darknet voc 416 | pytorch voc 608| pytorch voc 416| 76 | |:-: |:-: | :-: |:-: | 77 | | 77.2 % | 76.2% | 74.7% | 74.9% | 78 | | 27ms | 18ms | 27ms | 18ms | 79 | 80 | ## Demo 81 | 82 | ```python 83 | 84 | python demo.py --images images --save_path ./output --weights ./weights/convert_yolov3_coco.pth -d COCO 85 | 86 | ``` 87 | 88 | ## Example 89 | 90 | 91 | 92 | 93 | ## References 94 | - [YOLOv3: An Incremental Improvemet](https://pjreddie.com/media/files/papers/YOLOv3.pdf) 95 | 96 | - [Original Implementation (Darknet)](https://github.com/pjreddie/darknet) 97 | 98 | - [pytorch-yolo-v3](https://github.com/ayooshkathuria/pytorch-yolo-v3) 99 | 100 | - [pytorch-yolo2](https://github.com/marvis/pytorch-yolo2) 101 | -------------------------------------------------------------------------------- /convert_darknet.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # Written by yq_yao 3 | # 4 | import torch 5 | import torch.nn as nn 6 | import torch.nn.functional as F 7 | from torch.autograd import Variable 8 | import numpy as np 9 | from data.config import voc_config, coco_config 10 | from model.yolo import Yolov3 11 | from model.darknet53 import Darknet53 12 | import argparse 13 | import os 14 | 15 | def copy_weights(bn, conv, ptr, weights, use_bn=True): 16 | if use_bn: 17 | num_bn_biases = bn.bias.numel() 18 | 19 | #Load the weights 20 | bn_biases = torch.from_numpy(weights[ptr:ptr + num_bn_biases]) 21 | ptr += num_bn_biases 22 | 23 | bn_weights = torch.from_numpy(weights[ptr: ptr + num_bn_biases]) 24 | ptr += num_bn_biases 25 | 26 | bn_running_mean = torch.from_numpy(weights[ptr: ptr + num_bn_biases]) 27 | ptr += num_bn_biases 28 | 29 | bn_running_var = torch.from_numpy(weights[ptr: ptr + num_bn_biases]) 30 | ptr += num_bn_biases 31 | 32 | #Cast the loaded weights into dims of model weights. 33 | bn_biases = bn_biases.view_as(bn.bias.data) 34 | bn_weights = bn_weights.view_as(bn.weight.data) 35 | bn_running_mean = bn_running_mean.view_as(bn.running_mean) 36 | bn_running_var = bn_running_var.view_as(bn.running_var) 37 | 38 | #Copy the data to model 39 | bn.bias.data.copy_(bn_biases) 40 | bn.weight.data.copy_(bn_weights) 41 | bn.running_mean.copy_(bn_running_mean) 42 | bn.running_var.copy_(bn_running_var) 43 | else: 44 | #Number of biases 45 | num_biases = conv.bias.numel() 46 | 47 | #Load the weights 48 | conv_biases = torch.from_numpy(weights[ptr: ptr + num_biases]) 49 | ptr = ptr + num_biases 50 | 51 | #reshape the loaded weights according to the dims of the model weights 52 | conv_biases = conv_biases.view_as(conv.bias.data) 53 | 54 | #Finally copy the data 55 | conv.bias.data.copy_(conv_biases) 56 | 57 | #Let us load the weights for the Convolutional layers 58 | num_weights = conv.weight.numel() 59 | conv_weights = torch.from_numpy(weights[ptr:ptr+num_weights]) 60 | ptr = ptr + num_weights 61 | 62 | conv_weights = conv_weights.view_as(conv.weight.data) 63 | conv.weight.data.copy_(conv_weights) 64 | return ptr 65 | 66 | def load_weights_darknet53(weightfile, yolov3): 67 | fp = open(weightfile, "rb") 68 | #The first 5 values are header information 69 | # 1. Major version number 70 | # 2. Minor Version Number 71 | # 3. Subversion number 72 | # 4. IMages seen 73 | header = np.fromfile(fp, dtype = np.int32, count = 5) 74 | weights = np.fromfile(fp, dtype = np.float32) 75 | print(len(weights)) 76 | ptr = 0 77 | first_conv = yolov3.conv 78 | bn = first_conv.bn 79 | conv = first_conv.conv 80 | # first conv copy 81 | ptr = copy_weights(bn, conv, ptr, weights) 82 | 83 | layers = [yolov3.layer1, yolov3.layer2, yolov3.layer3, yolov3.layer4, yolov3.layer5] 84 | for layer in layers: 85 | for i in range(len(layer)): 86 | if i == 0: 87 | bn = layer[i].bn 88 | conv = layer[i].conv 89 | ptr = copy_weights(bn, conv, ptr, weights) 90 | else: 91 | bn = layer[i].conv1.bn 92 | conv = layer[i].conv1.conv 93 | ptr = copy_weights(bn, conv, ptr, weights) 94 | bn = layer[i].conv2.bn 95 | conv = layer[i].conv2.conv 96 | ptr = copy_weights(bn, conv, ptr, weights) 97 | print(ptr) 98 | fp.close() 99 | 100 | def load_weights(weightfile, yolov3, version): 101 | if version == "voc" or version == "coco": 102 | load_weights_yolov3(weightfile, yolov3) 103 | elif version == "darknet53": 104 | load_weights_darknet53(weightfile, yolov3) 105 | 106 | def load_weights_yolov3(weightfile, yolov3): 107 | fp = open(weightfile, "rb") 108 | #The first 5 values are header information 109 | # 1. Major version number 110 | # 2. Minor Version Number 111 | # 3. Subversion number 112 | # 4, 5. IMages seen 113 | header = np.fromfile(fp, dtype = np.int32, count = 5) 114 | weights = np.fromfile(fp, dtype = np.float32) 115 | print(len(weights)) 116 | ptr = 0 117 | extractor = yolov3.extractor 118 | first_conv = extractor.conv 119 | bn = first_conv.bn 120 | conv = first_conv.conv 121 | # first conv copy 122 | ptr = copy_weights(bn, conv, ptr, weights) 123 | 124 | layers = [extractor.layer1, extractor.layer2, extractor.layer3, extractor.layer4, extractor.layer5] 125 | for layer in layers: 126 | for i in range(len(layer)): 127 | if i == 0: 128 | bn = layer[i].bn 129 | conv = layer[i].conv 130 | ptr = copy_weights(bn, conv, ptr, weights) 131 | else: 132 | bn = layer[i].conv1.bn 133 | conv = layer[i].conv1.conv 134 | ptr = copy_weights(bn, conv, ptr, weights) 135 | bn = layer[i].conv2.bn 136 | conv = layer[i].conv2.conv 137 | ptr = copy_weights(bn, conv, ptr, weights) 138 | predict_conv_list1 = yolov3.predict_conv_list1 139 | smooth_conv1 = yolov3.smooth_conv1 140 | predict_conv_list2 = yolov3.predict_conv_list2 141 | smooth_conv2 = yolov3.smooth_conv2 142 | predict_conv_list3 = yolov3.predict_conv_list3 143 | for i in range(len(predict_conv_list1)): 144 | if i == 6: 145 | bn = 0 146 | conv = predict_conv_list1[i] 147 | ptr = copy_weights(bn, conv, ptr, weights, use_bn=False) 148 | else: 149 | bn = predict_conv_list1[i].bn 150 | conv = predict_conv_list1[i].conv 151 | ptr = copy_weights(bn, conv, ptr, weights) 152 | bn = smooth_conv1.bn 153 | conv = smooth_conv1.conv 154 | ptr = copy_weights(bn, conv, ptr, weights) 155 | for i in range(len(predict_conv_list2)): 156 | if i == 6: 157 | bn = 0 158 | conv = predict_conv_list2[i] 159 | ptr = copy_weights(bn, conv, ptr, weights, use_bn=False) 160 | else: 161 | bn = predict_conv_list2[i].bn 162 | conv = predict_conv_list2[i].conv 163 | ptr = copy_weights(bn, conv, ptr, weights) 164 | bn = smooth_conv2.bn 165 | conv = smooth_conv2.conv 166 | ptr = copy_weights(bn, conv, ptr, weights) 167 | 168 | for i in range(len(predict_conv_list3)): 169 | if i == 6: 170 | bn = 0 171 | conv = predict_conv_list3[i] 172 | ptr = copy_weights(bn, conv, ptr, weights, use_bn=False) 173 | else: 174 | bn = predict_conv_list3[i].bn 175 | conv = predict_conv_list3[i].conv 176 | ptr = copy_weights(bn, conv, ptr, weights) 177 | print(ptr) 178 | fp.close() 179 | 180 | 181 | def arg_parse(): 182 | """ 183 | Parse arguments to the train module 184 | """ 185 | parser = argparse.ArgumentParser( 186 | description='Yolov3 pytorch Training') 187 | parser.add_argument('--input_wh', default=(416, 416), 188 | help='input size.') 189 | parser.add_argument('--version', '--version', default='coco', 190 | help='voc, coco, darknet53') 191 | parser.add_argument('--weights', default='./weights/yolov3.weights', help='pretrained base model') 192 | parser.add_argument('--save_name', default='./weights/convert_yolov3_coco.pth', help='save name') 193 | 194 | return parser.parse_args() 195 | 196 | def load_weights_darknet19(weightfile, darknet19): 197 | fp = open(weightfile, "rb") 198 | #The first 4 values are header information 199 | # 1. Major version number 200 | # 2. Minor Version Number 201 | # 3. Subversion number 202 | # 4. IMages seen 203 | header = np.fromfile(fp, dtype = np.int32, count=4) 204 | weights = np.fromfile(fp, dtype = np.float32) 205 | ptr = 0 206 | first_conv = darknet19.conv 207 | bn = first_conv.bn 208 | conv = first_conv.conv 209 | # first conv copy 210 | ptr = copy_weights(bn, conv, ptr, weights) 211 | layers = [darknet19.layer1, darknet19.layer2, darknet19.layer3, darknet19.layer4, darknet19.layer5] 212 | for layer in layers: 213 | for i in range(len(layer)): 214 | if i == 0: 215 | pass 216 | else: 217 | bn = layer[i].bn 218 | conv = layer[i].conv 219 | ptr = copy_weights(bn, conv, ptr, weights) 220 | fp.close() 221 | 222 | if __name__ == '__main__': 223 | args = arg_parse() 224 | weightfile = args.weights 225 | input_wh = args.input_wh 226 | version = args.version 227 | save_name = args.save_name 228 | if version == "voc": 229 | cfg = voc_config 230 | yolov3 = Yolov3("train", input_wh, cfg["anchors"], cfg["anchors_mask"], cfg["num_classes"]) 231 | elif version == "coco": 232 | cfg = coco_config 233 | yolov3 = Yolov3("train", input_wh, cfg["anchors"], cfg["anchors_mask"], cfg["num_classes"]) 234 | elif version == "darknet53": 235 | cfg = voc_config 236 | num_blocks = [1,2,8,8,4] 237 | yolov3 = Darknet53(num_blocks) 238 | else: 239 | print("Unkown version !!!") 240 | import sys 241 | sys.exit() 242 | 243 | load_weights(weightfile, yolov3, version) 244 | # name = "convert_yolo_" + version + ".pth" 245 | # save_path = os.path.join("./weights", name) 246 | torch.save(yolov3.state_dict(), save_name) 247 | 248 | 249 | 250 | -------------------------------------------------------------------------------- /data/coco.py: -------------------------------------------------------------------------------- 1 | """VOC Dataset Classes 2 | 3 | Original author: Francisco Massa 4 | https://github.com/fmassa/vision/blob/voc_dataset/torchvision/datasets/voc.py 5 | 6 | Updated by: Ellis Brown, Max deGroot 7 | """ 8 | 9 | import os 10 | import pickle 11 | import os.path 12 | import sys 13 | import torch 14 | import torch.utils.data as data 15 | import torchvision.transforms as transforms 16 | import cv2 17 | import numpy as np 18 | import json 19 | import uuid 20 | from .data_augment import preproc 21 | 22 | from pycocotools.coco import COCO 23 | from pycocotools.cocoeval import COCOeval 24 | from pycocotools import mask as COCOmask 25 | 26 | class COCOAnnotationTransform(object): 27 | """Transforms a COCO annotation into a Tensor of bbox coords and label index 28 | Initilized with a dictionary lookup of classnames to indexes 29 | 30 | Arguments: 31 | class_to_ind (dict, optional): dictionary lookup of classnames -> indexes 32 | (default: alphabetic indexing of VOC's 20 classes) 33 | keep_difficult (bool, optional): keep difficult instances or not 34 | (default: False) 35 | height (int): height 36 | width (int): width 37 | """ 38 | 39 | def __init__(self): 40 | pass 41 | 42 | def __call__(self, target, width, height): 43 | """ 44 | Arguments: 45 | target (annotation) : the target annotation to be made usable 46 | will be not normlized 47 | Returns: 48 | a list containing lists of bounding boxes [bbox coords, class name] 49 | """ 50 | 51 | boxes = target[:,:-1].copy() 52 | labels = target[:,-1].copy() 53 | boxes[:, 0::2] /= width 54 | boxes[:, 1::2] /= height 55 | b_w = (boxes[:, 2] - boxes[:, 0])*1. 56 | b_h = (boxes[:, 3] - boxes[:, 1])*1. 57 | mask_b= np.minimum(b_w, b_h) > 0.01 58 | boxes_t = boxes[mask_b] 59 | labels_t = labels[mask_b].copy() 60 | 61 | return boxes_t, labels_t 62 | 63 | 64 | class COCODetection(data.Dataset): 65 | 66 | """VOC Detection Dataset Object 67 | 68 | input is image, target is annotation 69 | 70 | Arguments: 71 | root (string): filepath to VOCdevkit folder. 72 | image_set (string): imageset to use (eg. 'train', 'val', 'test') 73 | transform (callable, optional): transformation to perform on the 74 | input image 75 | target_transform (callable, optional): transformation to perform on the 76 | target `annotation` 77 | (eg: take in caption string, return tensor of word indices) 78 | dataset_name (string, optional): which dataset to load 79 | (default: 'VOC2007') 80 | """ 81 | 82 | def __init__(self, root, image_sets, resize_wh, batch_size, multiscale=False, dataset_name='COCO'): 83 | self.root = root 84 | self.cache_path = os.path.join(self.root, 'cache') 85 | self.image_set = image_sets 86 | self.name = dataset_name 87 | self.resize_wh = resize_wh 88 | self.batch_size = batch_size 89 | self.multiscale = multiscale 90 | self.transform = preproc() 91 | self.ids = list() 92 | self.annotations = list() 93 | self._view_map = { 94 | 'minival2014' : 'val2014', # 5k val2014 subset 95 | 'valminusminival2014' : 'val2014', # val2014 \setminus minival2014 96 | 'test-dev2015' : 'test2015', 97 | } 98 | 99 | for (year, image_set) in image_sets: 100 | coco_name = image_set+year 101 | data_name = (self._view_map[coco_name] 102 | if coco_name in self._view_map 103 | else coco_name) 104 | annofile = self._get_ann_file(coco_name) 105 | _COCO = COCO(annofile) 106 | self._COCO = _COCO 107 | self.coco_name = coco_name 108 | cats = _COCO.loadCats(_COCO.getCatIds()) 109 | self._classes = tuple([c['name'] for c in cats]) 110 | self.num_classes = len(self._classes) 111 | self._class_to_ind = dict(zip(self._classes, range(self.num_classes))) 112 | self._class_to_coco_cat_id = dict(zip([c['name'] for c in cats], 113 | _COCO.getCatIds())) 114 | indexes = _COCO.getImgIds() 115 | self.image_indexes = indexes 116 | self.ids.extend([self.image_path_from_index(data_name, index) for index in indexes ]) 117 | if image_set.find('test') != -1: 118 | print('test set will not load annotations!') 119 | else: 120 | self.annotations.extend(self._load_coco_annotations(coco_name, indexes,_COCO)) 121 | 122 | 123 | 124 | def image_path_from_index(self, name, index): 125 | """ 126 | Construct an image path from the image's "index" identifier. 127 | """ 128 | # Example image path for index=119993: 129 | # images/train2014/COCO_train2014_000000119993.jpg 130 | file_name = ('COCO_' + name + '_' + 131 | str(index).zfill(12) + '.jpg') 132 | image_path = os.path.join(self.root, 'images', 133 | name, file_name) 134 | assert os.path.exists(image_path), \ 135 | 'Path does not exist: {}'.format(image_path) 136 | return image_path 137 | 138 | 139 | def _get_ann_file(self, name): 140 | prefix = 'instances' if name.find('test') == -1 \ 141 | else 'image_info' 142 | return os.path.join(self.root, 'annotations', 143 | prefix + '_' + name + '.json') 144 | 145 | 146 | def _load_coco_annotations(self, coco_name, indexes, _COCO): 147 | cache_file=os.path.join(self.cache_path,coco_name+'_gt_roidb.pkl') 148 | if os.path.exists(cache_file): 149 | with open(cache_file, 'rb') as fid: 150 | roidb = pickle.load(fid) 151 | print('{} gt roidb loaded from {}'.format(coco_name,cache_file)) 152 | return roidb 153 | 154 | gt_roidb = [self._annotation_from_index(index, _COCO) 155 | for index in indexes] 156 | with open(cache_file, 'wb') as fid: 157 | pickle.dump(gt_roidb,fid,pickle.HIGHEST_PROTOCOL) 158 | print('wrote gt roidb to {}'.format(cache_file)) 159 | return gt_roidb 160 | 161 | 162 | def _annotation_from_index(self, index, _COCO): 163 | """ 164 | Loads COCO bounding-box instance annotations. Crowd instances are 165 | handled by marking their overlaps (with all categories) to -1. This 166 | overlap value means that crowd "instances" are excluded from training. 167 | """ 168 | im_ann = _COCO.loadImgs(index)[0] 169 | width = im_ann['width'] 170 | height = im_ann['height'] 171 | 172 | annIds = _COCO.getAnnIds(imgIds=index, iscrowd=None) 173 | objs = _COCO.loadAnns(annIds) 174 | # Sanitize bboxes -- some are invalid 175 | valid_objs = [] 176 | for obj in objs: 177 | x1 = np.max((0, obj['bbox'][0])) 178 | y1 = np.max((0, obj['bbox'][1])) 179 | x2 = np.min((width - 1, x1 + np.max((0, obj['bbox'][2] - 1)))) 180 | y2 = np.min((height - 1, y1 + np.max((0, obj['bbox'][3] - 1)))) 181 | if obj['area'] > 0 and x2 >= x1 and y2 >= y1: 182 | obj['clean_bbox'] = [x1, y1, x2, y2] 183 | valid_objs.append(obj) 184 | objs = valid_objs 185 | num_objs = len(objs) 186 | 187 | res = np.zeros((num_objs, 5)) 188 | 189 | # Lookup table to map from COCO category ids to our internal class 190 | # indices 191 | coco_cat_id_to_class_ind = dict([(self._class_to_coco_cat_id[cls], 192 | self._class_to_ind[cls]) 193 | for cls in self._classes]) 194 | 195 | for ix, obj in enumerate(objs): 196 | cls = coco_cat_id_to_class_ind[obj['category_id']] 197 | res[ix, 0:4] = obj['clean_bbox'] 198 | res[ix, 4] = cls 199 | 200 | return res 201 | 202 | 203 | 204 | def __getitem__(self, index): 205 | img_id = self.ids[index] 206 | target = self.annotations[index] 207 | if self.multiscale: 208 | if index % (self.batch_size * 20) == 0: 209 | rnd = (random.randint(0,9) + 10) * 32 210 | print("resize scale", index, rnd) 211 | self.resize_wh = (rnd, rnd) 212 | img = cv2.imread(img_id, cv2.IMREAD_COLOR) 213 | height, width, _ = img.shape 214 | 215 | if self.transform is not None: 216 | img, target = self.transform(img, target, self.resize_wh) 217 | 218 | return img, target 219 | 220 | def __len__(self): 221 | return len(self.ids) 222 | 223 | def pull_image(self, index): 224 | '''Returns the original image object at index in PIL form 225 | 226 | Note: not using self.__getitem__(), as any transformations passed in 227 | could mess up this functionality. 228 | 229 | Argument: 230 | index (int): index of img to show 231 | Return: 232 | PIL img 233 | ''' 234 | img_id = self.ids[index] 235 | return cv2.imread(img_id, cv2.IMREAD_COLOR), img_id 236 | 237 | 238 | def pull_tensor(self, index): 239 | '''Returns the original image at an index in tensor form 240 | 241 | Note: not using self.__getitem__(), as any transformations passed in 242 | could mess up this functionality. 243 | 244 | Argument: 245 | index (int): index of img to show 246 | Return: 247 | tensorized version of img, squeezed 248 | ''' 249 | to_tensor = transforms.ToTensor() 250 | return torch.Tensor(self.pull_image(index)).unsqueeze_(0) 251 | 252 | def _print_detection_eval_metrics(self, coco_eval): 253 | IoU_lo_thresh = 0.5 254 | IoU_hi_thresh = 0.95 255 | def _get_thr_ind(coco_eval, thr): 256 | ind = np.where((coco_eval.params.iouThrs > thr - 1e-5) & 257 | (coco_eval.params.iouThrs < thr + 1e-5))[0][0] 258 | iou_thr = coco_eval.params.iouThrs[ind] 259 | assert np.isclose(iou_thr, thr) 260 | return ind 261 | 262 | ind_lo = _get_thr_ind(coco_eval, IoU_lo_thresh) 263 | ind_hi = _get_thr_ind(coco_eval, IoU_hi_thresh) 264 | # precision has dims (iou, recall, cls, area range, max dets) 265 | # area range index 0: all area ranges 266 | # max dets index 2: 100 per image 267 | precision = \ 268 | coco_eval.eval['precision'][ind_lo:(ind_hi + 1), :, :, 0, 2] 269 | ap_default = np.mean(precision[precision > -1]) 270 | print('~~~~ Mean and per-category AP @ IoU=[{:.2f},{:.2f}] ' 271 | '~~~~'.format(IoU_lo_thresh, IoU_hi_thresh)) 272 | print('{:.1f}'.format(100 * ap_default)) 273 | for cls_ind, cls in enumerate(self._classes): 274 | if cls == '__background__': 275 | continue 276 | # minus 1 because of __background__ 277 | precision = coco_eval.eval['precision'][ind_lo:(ind_hi + 1), :, cls_ind - 1, 0, 2] 278 | ap = np.mean(precision[precision > -1]) 279 | print('{:.1f}'.format(100 * ap)) 280 | 281 | print('~~~~ Summary metrics ~~~~') 282 | coco_eval.summarize() 283 | 284 | def _do_detection_eval(self, res_file, output_dir): 285 | ann_type = 'bbox' 286 | coco_dt = self._COCO.loadRes(res_file) 287 | coco_eval = COCOeval(self._COCO, coco_dt) 288 | coco_eval.params.useSegm = (ann_type == 'segm') 289 | coco_eval.evaluate() 290 | coco_eval.accumulate() 291 | self._print_detection_eval_metrics(coco_eval) 292 | eval_file = os.path.join(output_dir, 'detection_results.pkl') 293 | with open(eval_file, 'wb') as fid: 294 | pickle.dump(coco_eval, fid, pickle.HIGHEST_PROTOCOL) 295 | print('Wrote COCO eval results to: {}'.format(eval_file)) 296 | 297 | def _coco_results_one_category(self, boxes, cat_id): 298 | results = [] 299 | for im_ind, index in enumerate(self.image_indexes): 300 | # print(type(boxes[im_ind])) 301 | # print(boxes[im_ind]) 302 | # dets = boxes[im_ind].astype(np.float) 303 | dets = boxes[im_ind] 304 | if dets == []: 305 | continue 306 | dets = boxes[im_ind].astype(np.float) 307 | scores = dets[:, -1] 308 | xs = dets[:, 0] 309 | ys = dets[:, 1] 310 | ws = dets[:, 2] - xs + 1 311 | hs = dets[:, 3] - ys + 1 312 | results.extend( 313 | [{'image_id' : index, 314 | 'category_id' : cat_id, 315 | 'bbox' : [xs[k], ys[k], ws[k], hs[k]], 316 | 'score' : scores[k]} for k in range(dets.shape[0])]) 317 | return results 318 | 319 | def _write_coco_results_file(self, all_boxes, res_file): 320 | # [{"image_id": 42, 321 | # "category_id": 18, 322 | # "bbox": [258.15,41.29,348.26,243.78], 323 | # "score": 0.236}, ...] 324 | results = [] 325 | for cls_ind, cls in enumerate(self._classes): 326 | if cls == '__background__': 327 | continue 328 | print('Collecting {} results ({:d}/{:d})'.format(cls, cls_ind, 329 | self.num_classes )) 330 | coco_cat_id = self._class_to_coco_cat_id[cls] 331 | results.extend(self._coco_results_one_category(all_boxes[cls_ind], 332 | coco_cat_id)) 333 | ''' 334 | if cls_ind ==30: 335 | res_f = res_file+ '_1.json' 336 | print('Writing results json to {}'.format(res_f)) 337 | with open(res_f, 'w') as fid: 338 | json.dump(results, fid) 339 | results = [] 340 | ''' 341 | #res_f2 = res_file+'_2.json' 342 | print('Writing results json to {}'.format(res_file)) 343 | with open(res_file, 'w') as fid: 344 | json.dump(results, fid) 345 | 346 | def evaluate_detections(self, all_boxes, output_dir): 347 | res_file = os.path.join(output_dir, ('detections_' + 348 | self.coco_name + 349 | '_results')) 350 | res_file += '.json' 351 | self._write_coco_results_file(all_boxes, res_file) 352 | # Only do evaluation on non-test sets 353 | if self.coco_name.find('test') == -1: 354 | self._do_detection_eval(res_file, output_dir) 355 | # Optionally cleanup results json file 356 | 357 | -------------------------------------------------------------------------------- /data/config.py: -------------------------------------------------------------------------------- 1 | # config.py 2 | import os 3 | import os.path 4 | 5 | pwd = os.getcwd() 6 | VOCroot = os.path.join(pwd, "data/datasets/VOCdevkit0712/") 7 | COCOroot = os.path.join(pwd, "data/datasets/coco2015") 8 | 9 | datasets_dict = {"VOC": [('0712', '0712_trainval')], 10 | "VOC0712++": [('0712', '0712_trainval_test')], 11 | "VOC2012" : [('2012', '2012_trainval')], 12 | "COCO": [('2014', 'train'), ('2014', 'valminusminival')], 13 | "VOC2007": [('0712', "2007_test")], 14 | "COCOval": [('2014', 'minival')]} 15 | 16 | 17 | voc_config = { 18 | 'anchors' : [[116, 90], [156, 198], [373, 326], 19 | [30, 61], [62, 45], [59, 119], 20 | [10, 13], [16, 30], [33, 23]], 21 | 'root': VOCroot, 22 | 'num_classes': 20, 23 | 'multiscale' : True, 24 | 'name_path' : "./model/voc.names", 25 | 'anchors_mask' : [[0,1,2], [3,4,5], [6,7,8]] 26 | } 27 | 28 | coco_config = { 29 | 'anchors' : [[116, 90], [156, 198], [373, 326], 30 | [30, 61], [62, 45], [59, 119], 31 | [10, 13], [16, 30], [33, 23]], 32 | 'root': COCOroot, 33 | 'num_classes': 80, 34 | 'multiscale' : True, 35 | 'name_path' : "./model/coco.names", 36 | 'anchors_mask' : [[0,1,2], [3,4,5], [6,7,8]] 37 | } 38 | 39 | # anchors = [[214,327], [326,193], [359,359], 40 | # [116,286], [122,97], [171,180], 41 | # [24,34], [46,84], [68,185]] -------------------------------------------------------------------------------- /data/data_augment.py: -------------------------------------------------------------------------------- 1 | """Data augmentation functionality. Passed as callable transformations to 2 | Dataset classes. 3 | 4 | The data augmentation procedures were interpreted from @weiliu89's SSD paper 5 | http://arxiv.org/abs/1512.02325 6 | 7 | TODO: implement data_augment for training 8 | 9 | Ellis Brown, Max deGroot 10 | """ 11 | 12 | import torch 13 | from torchvision import transforms 14 | import cv2 15 | import numpy as np 16 | import random 17 | import math 18 | 19 | 20 | def matrix_iou(a,b): 21 | """ 22 | return iou of a and b, numpy version for data augenmentation 23 | """ 24 | lt = np.maximum(a[:, np.newaxis, :2], b[:, :2]) 25 | rb = np.minimum(a[:, np.newaxis, 2:], b[:, 2:]) 26 | 27 | area_i = np.prod(rb - lt, axis=2) * (lt < rb).all(axis=2) 28 | area_a = np.prod(a[:, 2:] - a[:, :2], axis=1) 29 | area_b = np.prod(b[:, 2:] - b[:, :2], axis=1) 30 | return area_i / (area_a[:, np.newaxis] + area_b - area_i) 31 | 32 | def _crop(image, boxes, labels): 33 | height, width, _ = image.shape 34 | 35 | if len(boxes)== 0: 36 | return image, boxes, labels 37 | 38 | while True: 39 | mode = random.choice(( 40 | None, 41 | (0.3, None), 42 | (0.5, None), 43 | (0.7, None), 44 | (0.9, None), 45 | # (None, None), 46 | )) 47 | 48 | if mode is None: 49 | return image, boxes, labels 50 | 51 | min_iou, max_iou = mode 52 | if min_iou is None: 53 | min_iou = float('-inf') 54 | if max_iou is None: 55 | max_iou = float('inf') 56 | 57 | for _ in range(50): 58 | scale = random.uniform(0.3,1.) 59 | min_ratio = max(0.5, scale*scale) 60 | max_ratio = min(2, 1. / scale / scale) 61 | ratio = math.sqrt(random.uniform(min_ratio, max_ratio)) 62 | w = int(scale * ratio * width) 63 | h = int((scale / ratio) * height) 64 | 65 | 66 | l = random.randrange(width - w) 67 | t = random.randrange(height - h) 68 | roi = np.array((l, t, l + w, t + h)) 69 | 70 | iou = matrix_iou(boxes, roi[np.newaxis]) 71 | 72 | if not (min_iou <= iou.min() and iou.max() <= max_iou): 73 | continue 74 | 75 | image_t = image[roi[1]:roi[3], roi[0]:roi[2]] 76 | 77 | centers = (boxes[:, :2] + boxes[:, 2:]) / 2 78 | mask = np.logical_and(roi[:2] < centers, centers < roi[2:]) \ 79 | .all(axis=1) 80 | boxes_t = boxes[mask].copy() 81 | labels_t = labels[mask].copy() 82 | if len(boxes_t) == 0: 83 | continue 84 | 85 | boxes_t[:, :2] = np.maximum(boxes_t[:, :2], roi[:2]) 86 | boxes_t[:, :2] -= roi[:2] 87 | boxes_t[:, 2:] = np.minimum(boxes_t[:, 2:], roi[2:]) 88 | boxes_t[:, 2:] -= roi[:2] 89 | 90 | return image_t, boxes_t,labels_t 91 | 92 | 93 | def _distort(image): 94 | def _convert(image, alpha=1, beta=0): 95 | tmp = image.astype(float) * alpha + beta 96 | tmp[tmp < 0] = 0 97 | tmp[tmp > 255] = 255 98 | image[:] = tmp 99 | 100 | image = image.copy() 101 | 102 | if random.randrange(2): 103 | _convert(image, beta=random.uniform(-32, 32)) 104 | 105 | if random.randrange(2): 106 | _convert(image, alpha=random.uniform(0.5, 1.5)) 107 | 108 | image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) 109 | 110 | if random.randrange(2): 111 | tmp = image[:, :, 0].astype(int) + random.randint(-18, 18) 112 | tmp %= 180 113 | image[:, :, 0] = tmp 114 | 115 | if random.randrange(2): 116 | _convert(image[:, :, 1], alpha=random.uniform(0.5, 1.5)) 117 | 118 | image = cv2.cvtColor(image, cv2.COLOR_HSV2BGR) 119 | 120 | return image 121 | 122 | 123 | def _expand(image, boxes,fill, p): 124 | if random.random() > p: 125 | return image, boxes 126 | 127 | height, width, depth = image.shape 128 | for _ in range(50): 129 | scale = random.uniform(1,4) 130 | 131 | min_ratio = max(0.5, 1./scale/scale) 132 | max_ratio = min(2, scale*scale) 133 | ratio = math.sqrt(random.uniform(min_ratio, max_ratio)) 134 | ws = scale * ratio 135 | hs = scale / ratio 136 | if ws < 1 or hs < 1: 137 | continue 138 | w = int(ws * width) 139 | h = int(hs * height) 140 | 141 | left = random.randint(0, w - width) 142 | top = random.randint(0, h - height) 143 | 144 | boxes_t = boxes.copy() 145 | boxes_t[:, :2] += (left, top) 146 | boxes_t[:, 2:] += (left, top) 147 | 148 | 149 | expand_image = np.empty( 150 | (h, w, depth), 151 | dtype=image.dtype) 152 | expand_image[:, :] = fill 153 | expand_image[top:top + height, left:left + width] = image 154 | image = expand_image 155 | 156 | return image, boxes_t 157 | 158 | 159 | def _mirror(image, boxes): 160 | _, width, _ = image.shape 161 | if random.randrange(2): 162 | image = image[:, ::-1] 163 | boxes = boxes.copy() 164 | boxes[:, 0::2] = width - boxes[:, 2::-2] 165 | return image, boxes 166 | 167 | def rand(a=0, b=1): 168 | return np.random.rand()*(b-a) + a 169 | 170 | # def random_letterbox_image(img, resize_wh, boxes, jitter=0.3): 171 | # '''resize image with unchanged aspect ratio using padding''' 172 | # img_w, img_h = img.shape[1], img.shape[0] 173 | # w, h = resize_wh 174 | # new_ar = w / h * rand(1-jitter, 1+jitter)/rand(1-jitter, 1+jitter) 175 | # scale = rand(.25, 2) 176 | # if new_ar < 1: 177 | # nh = int(scale * h) 178 | # nw = int(nh * new_ar) 179 | # else: 180 | # nw = int(scale * w) 181 | # nh = int(nw / new_ar) 182 | # resized_image = cv2.resize(img, (nw, nh), interpolation = cv2.INTER_CUBIC) 183 | 184 | # dx = int(rand(0, w - nw)) 185 | # dy = int(rand(0, h - nh)) 186 | 187 | # if (w - nw) < 0: 188 | # cxmin = 0 189 | # xmin = nw - w + dx 190 | # xmax = nw + dx 191 | # cxmax = xmax - xmin 192 | # else: 193 | # cxmin = dx 194 | # xmin = 0 195 | # xmax = nw 196 | # cxmax = nw + dx 197 | # if (h - nh) < 0: 198 | # cymin = 0 199 | # ymin = nh - h + dy 200 | # ymax = nh + dy 201 | # cymax = ymax - ymin 202 | # else: 203 | # cymin = dy 204 | # ymin = 0 205 | # ymax = nh 206 | # cymax = nh + dy 207 | 208 | # resized_image = resized_image[ymin:ymax,xmin:xmax,:] 209 | 210 | # boxes[:, 0::2] = (boxes[:, 0::2] * nw / img_w + dx) / w 211 | # boxes[:, 1::2] = (boxes[:, 1::2] * nh / img_h + dy ) / h 212 | # # clamp boxes 213 | # boxes[:, 0:2][boxes[:, 0:2]<=0] = 0 214 | # boxes[:, 2:][boxes[:, 2:]>=1] = 0.9999 215 | 216 | # canvas = np.full((resize_wh[1], resize_wh[0], 3), 128) 217 | # canvas[cymin:cymax, cxmin:cxmax, :] = resized_image 218 | 219 | # img_ = canvas[:,:,::-1].transpose((2,0,1)).copy() 220 | # img_ = torch.from_numpy(img_).float().div(255.0) 221 | # return img_, boxes 222 | 223 | def letterbox_image(img, resize_wh, boxes): 224 | '''resize image with unchanged aspect ratio using padding''' 225 | img_w, img_h = img.shape[1], img.shape[0] 226 | w, h = resize_wh 227 | new_w = int(img_w * min(w/img_w, h/img_h)) 228 | new_h = int(img_h * min(w/img_w, h/img_h)) 229 | 230 | if len(boxes) > 0: 231 | boxes = boxes.copy() 232 | dim_diff = np.abs(img_w - img_h) 233 | max_size = max(img_w, img_h) 234 | if img_w > img_h: 235 | boxes[:, 1::2] += dim_diff // 2 236 | else: 237 | boxes[:, 0::2] += dim_diff // 2 238 | boxes[:, 0::2] /= max_size 239 | boxes[:, 1::2] /= max_size 240 | resized_image = cv2.resize(img, (new_w, new_h), interpolation = cv2.INTER_CUBIC) 241 | canvas = np.full((resize_wh[0], resize_wh[1], 3), 128) 242 | canvas[(h-new_h)//2:(h-new_h)//2 + new_h,(w-new_w)//2:(w-new_w)//2 + new_w, :] = resized_image 243 | img_ = canvas[:,:,::-1].transpose((2,0,1)).copy() 244 | img_ = torch.from_numpy(img_).float().div(255.0) 245 | 246 | return img_, boxes 247 | 248 | 249 | class preproc(object): 250 | 251 | def __init__(self): 252 | self.means = [128, 128, 128] 253 | self.p = 0.5 254 | 255 | def __call__(self, image, targets, resize_wh, use_pad=True): 256 | boxes = targets[:, :-1].copy() 257 | labels = targets[:, -1].copy() 258 | height, width, _ = image.shape 259 | if len(boxes) == 0: 260 | targets = np.zeros((1,5)) 261 | image, _ = letterbox_image(image, resize_wh, boxes) 262 | return image, targets 263 | image_o = image.copy() 264 | targets_o = targets.copy() 265 | image_t, boxes, labels = _crop(image, boxes, labels) 266 | image_t = _distort(image_t) 267 | image_t, boxes = _expand(image_t, boxes, self.means, self.p) 268 | image_t, boxes = _mirror(image_t, boxes) 269 | image_t, boxes = letterbox_image(image_t, resize_wh, boxes) 270 | 271 | boxes = boxes.copy() 272 | b_w = (boxes[:, 2] - boxes[:, 0])*1. 273 | b_h = (boxes[:, 3] - boxes[:, 1])*1. 274 | mask_b= np.minimum(b_w, b_h) > 0.01 275 | boxes_t = boxes[mask_b] 276 | labels_t = labels[mask_b].copy() 277 | 278 | if len(boxes_t) == 0: 279 | boxes_t = targets_o[:, :4].copy() 280 | labels_t = targets_o[:, -1].copy() 281 | image_t, boxes_t = letterbox_image(image_o, resize_wh, boxes_t) 282 | 283 | boxes_t[:, 0::2] *= resize_wh[0] 284 | boxes_t[:, 1::2] *= resize_wh[1] 285 | 286 | labels_t = np.expand_dims(labels_t, 1) 287 | targets_t = np.hstack((boxes_t, labels_t)) 288 | 289 | return image_t, targets_t 290 | 291 | 292 | -------------------------------------------------------------------------------- /data/voc0712.py: -------------------------------------------------------------------------------- 1 | """VOC Dataset Classes 2 | 3 | Original author: Francisco Massa 4 | https://github.com/fmassa/vision/blob/voc_dataset/torchvision/datasets/voc.py 5 | 6 | Updated by: Ellis Brown, Max deGroot 7 | """ 8 | 9 | import os 10 | import os.path 11 | import pickle 12 | import sys 13 | import torch 14 | import torch.utils.data as data 15 | from PIL import Image, ImageDraw, ImageFont 16 | import cv2 17 | import numpy as np 18 | from .voc_eval import voc_eval 19 | import random 20 | from .data_augment import preproc 21 | if sys.version_info[0] == 2: 22 | import xml.etree.cElementTree as ET 23 | else: 24 | import xml.etree.ElementTree as ET 25 | 26 | VOC_CLASSES = ( 27 | 'aeroplane', 'bicycle', 'bird', 'boat', 28 | 'bottle', 'bus', 'car', 'cat', 'chair', 29 | 'cow', 'diningtable', 'dog', 'horse', 30 | 'motorbike', 'person', 'pottedplant', 31 | 'sheep', 'sofa', 'train', 'tvmonitor') 32 | 33 | # for making bounding boxes pretty 34 | COLORS = ((255, 0, 0, 128), (0, 255, 0, 128), (0, 0, 255, 128), 35 | (0, 255, 255, 128), (255, 0, 255, 128), (255, 255, 0, 128)) 36 | 37 | 38 | class VOCAnnotationTransform(object): 39 | """Transforms a VOC annotation into a Tensor of bbox coords and label index 40 | Initilized with a dictionary lookup of classnames to indexes 41 | 42 | Arguments: 43 | class_to_ind (dict, optional): dictionary lookup of classnames -> indexes 44 | (default: alphabetic indexing of VOC's 20 classes) 45 | keep_difficult (bool, optional): keep difficult instances or not 46 | (default: False) 47 | height (int): height 48 | width (int): width 49 | """ 50 | 51 | def __init__(self, class_to_ind=None, keep_difficult=False): 52 | self.class_to_ind = class_to_ind or dict( 53 | zip(VOC_CLASSES, range(len(VOC_CLASSES)))) 54 | self.keep_difficult = keep_difficult 55 | 56 | def __call__(self, target, width, height): 57 | """ 58 | Arguments: 59 | target (annotation) : the target annotation to be made usable 60 | will be an ET.Element 61 | Returns: 62 | a list containing lists of bounding boxes [bbox coords, class name] 63 | """ 64 | res = np.empty((0,5)) 65 | for obj in target.iter('object'): 66 | difficult = int(obj.find('difficult').text) == 1 67 | if not self.keep_difficult and difficult: 68 | continue 69 | name = obj.find('name').text.lower().strip() 70 | bbox = obj.find('bndbox') 71 | 72 | pts = ['xmin', 'ymin', 'xmax', 'ymax'] 73 | bndbox = [] 74 | for i, pt in enumerate(pts): 75 | cur_pt = int(bbox.find(pt).text) 76 | bndbox.append(cur_pt) 77 | label_idx = self.class_to_ind[name] 78 | bndbox.append(label_idx) 79 | res = np.vstack((res, bndbox)) 80 | return res 81 | 82 | 83 | class VOCDetection(data.Dataset): 84 | """VOC Detection Dataset Object 85 | 86 | input is image, target is annotation 87 | 88 | Arguments: 89 | root (string): filepath to VOCdevkit folder. 90 | image_set (string): imageset to use (eg. 'train', 'val', 'test') 91 | transform (callable, optional): transformation to perform on the 92 | input image 93 | target_transform (callable, optional): transformation to perform on the 94 | target `annotation` 95 | (eg: take in caption string, return tensor of word indices) 96 | dataset_name (string, optional): which dataset to load 97 | (default: 'VOC2007') 98 | """ 99 | 100 | def __init__(self, root, image_sets, resize_wh, batch_size=16, multiscale=False, dataset_name='VOC0712'): 101 | self.root = root 102 | self.image_set = image_sets 103 | self.transform = preproc() 104 | self.resize_wh = resize_wh 105 | self.target_transform = VOCAnnotationTransform() 106 | self.name = dataset_name 107 | self.multiscale = multiscale 108 | self.batch_size = batch_size 109 | self._annopath = os.path.join('%s', 'Annotations', '%s.xml') 110 | self._imgpath = os.path.join('%s', 'JPEGImages', '%s.jpg') 111 | self.ids = list() 112 | for (year, name) in image_sets: 113 | self._year = year 114 | rootpath = os.path.join(self.root, 'VOC' + year) 115 | for line in open(os.path.join(rootpath, 'ImageSets', 'Main', name + '.txt')): 116 | self.ids.append((rootpath, line.strip())) 117 | 118 | def __getitem__(self, index): 119 | img_id = self.ids[index] 120 | # multiscale train 121 | if self.multiscale: 122 | if index % (self.batch_size * 20) == 0: 123 | rnd = (random.randint(0,9) + 10) * 32 124 | print("resize scale", index, rnd) 125 | self.resize_wh = (rnd, rnd) 126 | target = ET.parse(self._annopath % img_id).getroot() 127 | img = cv2.imread(self._imgpath % img_id) 128 | height, width, channels = img.shape 129 | if self.target_transform is not None: 130 | target = self.target_transform(target, width, height) 131 | 132 | if self.transform is not None: 133 | img, target = self.transform(img, target, self.resize_wh) 134 | 135 | 136 | return img, target 137 | 138 | 139 | def __len__(self): 140 | return len(self.ids) 141 | 142 | def pull_image(self, index): 143 | '''Returns the original image object at index in PIL form 144 | 145 | Note: not using self.__getitem__(), as any transformations passed in 146 | could mess up this functionality. 147 | 148 | Argument: 149 | index (int): index of img to show 150 | Return: 151 | PIL img 152 | ''' 153 | img_id = self.ids[index] 154 | return cv2.imread(self._imgpath % img_id, cv2.IMREAD_COLOR), img_id 155 | 156 | def pull_anno(self, index): 157 | '''Returns the original annotation of image at index 158 | 159 | Note: not using self.__getitem__(), as any transformations passed in 160 | could mess up this functionality. 161 | 162 | Argument: 163 | index (int): index of img to get annotation of 164 | Return: 165 | list: [img_id, [(label, bbox coords),...]] 166 | eg: ('001718', [('dog', (96, 13, 438, 332))]) 167 | ''' 168 | img_id = self.ids[index] 169 | anno = ET.parse(self._annopath % img_id).getroot() 170 | gt = self.target_transform(anno, 1, 1) 171 | return img_id[1], gt 172 | 173 | def pull_tensor(self, index): 174 | '''Returns the original image at an index in tensor form 175 | 176 | Note: not using self.__getitem__(), as any transformations passed in 177 | could mess up this functionality. 178 | 179 | Argument: 180 | index (int): index of img to show 181 | Return: 182 | tensorized version of img, squeezed 183 | ''' 184 | return torch.Tensor(self.pull_image(index)).unsqueeze_(0) 185 | 186 | def evaluate_detections(self, all_boxes, output_dir=None): 187 | """ 188 | all_boxes is a list of length number-of-classes. 189 | Each list element is a list of length number-of-images. 190 | Each of those list elements is either an empty list [] 191 | or a numpy array of detection. 192 | 193 | all_boxes[class][image] = [] or np.array of shape #dets x 5 194 | """ 195 | self._write_voc_results_file(all_boxes) 196 | self._do_python_eval(output_dir) 197 | 198 | def _get_voc_results_file_template(self): 199 | filename = 'comp3_det_test' + '_{:s}.txt' 200 | filedir = os.path.join( 201 | self.root, 'results', 'VOC' + self._year, 'Main') 202 | if not os.path.exists(filedir): 203 | os.makedirs(filedir) 204 | path = os.path.join(filedir, filename) 205 | return path 206 | 207 | def _write_voc_results_file(self, all_boxes): 208 | for cls_ind, cls in enumerate(VOC_CLASSES): 209 | print('Writing {} VOC results file'.format(cls)) 210 | filename = self._get_voc_results_file_template().format(cls) 211 | # print(filename) 212 | with open(filename, 'wt') as f: 213 | for im_ind, index in enumerate(self.ids): 214 | index = index[1] 215 | dets = all_boxes[cls_ind][im_ind] 216 | if dets == []: 217 | continue 218 | for k in range(dets.shape[0]): 219 | f.write('{:s} {:.3f} {:.1f} {:.1f} {:.1f} {:.1f}\n'. 220 | format(index, dets[k, -1], 221 | dets[k, 0] + 1, dets[k, 1] + 1, 222 | dets[k, 2] + 1, dets[k, 3] + 1)) 223 | 224 | def _do_python_eval(self, output_dir='output'): 225 | rootpath = os.path.join(self.root, 'VOC' + self._year) 226 | name = self.image_set[0][1] 227 | annopath = os.path.join( 228 | rootpath, 229 | 'Annotations', 230 | '{:s}.xml') 231 | imagesetfile = os.path.join( 232 | rootpath, 233 | 'ImageSets', 234 | 'Main', 235 | name+'.txt') 236 | cachedir = os.path.join(self.root, 'annotations_cache') 237 | aps = [] 238 | # The PASCAL VOC metric changed in 2010 239 | use_07_metric = True if int(self._year) < 2010 else False 240 | # use_07_metric = True 241 | print('VOC07 metric? ' + ('Yes' if use_07_metric else 'No')) 242 | if output_dir is not None and not os.path.isdir(output_dir): 243 | os.mkdir(output_dir) 244 | for i, cls in enumerate(VOC_CLASSES): 245 | 246 | filename = self._get_voc_results_file_template().format(cls) 247 | rec, prec, ap = voc_eval( 248 | filename, annopath, imagesetfile, cls, cachedir, ovthresh=0.5, 249 | use_07_metric=use_07_metric) 250 | aps += [ap] 251 | print('AP for {} = {:.4f}'.format(cls, ap)) 252 | if output_dir is not None: 253 | with open(os.path.join(output_dir, cls + '_pr.pkl'), 'wb') as f: 254 | pickle.dump({'rec': rec, 'prec': prec, 'ap': ap}, f) 255 | print('Mean AP = {:.4f}'.format(np.mean(aps))) 256 | print('~~~~~~~~') 257 | print('Results:') 258 | for ap in aps: 259 | print('{:.3f}'.format(ap)) 260 | print('{:.3f}'.format(np.mean(aps))) 261 | print('~~~~~~~~') 262 | print('') 263 | print('--------------------------------------------------------------') 264 | print('Results computed with the **unofficial** Python eval code.') 265 | print('Results should be very close to the official MATLAB eval code.') 266 | print('Recompute with `./tools/reval.py --matlab ...` for your paper.') 267 | print('-- Thanks, The Management') 268 | print('--------------------------------------------------------------') 269 | 270 | 271 | def detection_collate(batch): 272 | """Custom collate fn for dealing with batches of images that have a different 273 | number of associated object annotations (bounding boxes). 274 | 275 | Arguments: 276 | batch: (tuple) A tuple of tensor images and lists of annotations 277 | 278 | Return: 279 | A tuple containing: 280 | 1) (tensor) batch of images stacked on their 0 dim 281 | 2) (list of tensors) annotations for a given image are stacked on 0 dim 282 | """ 283 | targets = [] 284 | imgs = [] 285 | for sample in batch: 286 | imgs.append(sample[0]) 287 | targets.append(torch.FloatTensor(sample[1])) 288 | return torch.stack(imgs, 0), targets 289 | 290 | -------------------------------------------------------------------------------- /data/voc_eval.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast/er R-CNN 3 | # Licensed under The MIT License [see LICENSE for details] 4 | # Written by Bharath Hariharan 5 | # -------------------------------------------------------- 6 | 7 | import xml.etree.ElementTree as ET 8 | import os 9 | import pickle 10 | import numpy as np 11 | import pdb 12 | 13 | 14 | def parse_rec(filename): 15 | """ Parse a PASCAL VOC xml file """ 16 | tree = ET.parse(filename) 17 | objects = [] 18 | for obj in tree.findall('object'): 19 | obj_struct = {} 20 | obj_struct['name'] = obj.find('name').text 21 | obj_struct['pose'] = obj.find('pose').text 22 | obj_struct['truncated'] = int(obj.find('truncated').text) 23 | obj_struct['difficult'] = int(obj.find('difficult').text) 24 | bbox = obj.find('bndbox') 25 | obj_struct['bbox'] = [int(bbox.find('xmin').text), 26 | int(bbox.find('ymin').text), 27 | int(bbox.find('xmax').text), 28 | int(bbox.find('ymax').text)] 29 | objects.append(obj_struct) 30 | 31 | return objects 32 | 33 | 34 | 35 | def voc_ap(rec, prec, use_07_metric=False): 36 | """ ap = voc_ap(rec, prec, [use_07_metric]) 37 | Compute VOC AP given precision and recall. 38 | If use_07_metric is true, uses the 39 | VOC 07 11 point method (default:False). 40 | """ 41 | if use_07_metric: 42 | # 11 point metric 43 | ap = 0. 44 | for t in np.arange(0., 1.1, 0.1): 45 | if np.sum(rec >= t) == 0: 46 | p = 0 47 | else: 48 | p = np.max(prec[rec >= t]) 49 | ap = ap + p / 11. 50 | else: 51 | # correct AP calculation 52 | # first append sentinel values at the end 53 | mrec = np.concatenate(([0.], rec, [1.])) 54 | mpre = np.concatenate(([0.], prec, [0.])) 55 | 56 | # compute the precision envelope 57 | for i in range(mpre.size - 1, 0, -1): 58 | mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i]) 59 | 60 | # to calculate area under PR curve, look for points 61 | # where X axis (recall) changes value 62 | i = np.where(mrec[1:] != mrec[:-1])[0] 63 | 64 | # and sum (\Delta recall) * prec 65 | ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) 66 | return ap 67 | 68 | def voc_eval(detpath, 69 | annopath, 70 | imagesetfile, 71 | classname, 72 | cachedir, 73 | ovthresh=0.5, 74 | use_07_metric=False): 75 | """rec, prec, ap = voc_eval(detpath, 76 | annopath, 77 | imagesetfile, 78 | classname, 79 | [ovthresh], 80 | [use_07_metric]) 81 | 82 | Top level function that does the PASCAL VOC evaluation. 83 | 84 | detpath: Path to detections 85 | detpath.format(classname) should produce the detection results file. 86 | annopath: Path to annotations 87 | annopath.format(imagename) should be the xml annotations file. 88 | imagesetfile: Text file containing the list of images, one image per line. 89 | classname: Category name (duh) 90 | cachedir: Directory for caching the annotations 91 | [ovthresh]: Overlap threshold (default = 0.5) 92 | [use_07_metric]: Whether to use VOC07's 11 point AP computation 93 | (default False) 94 | """ 95 | # assumes detections are in detpath.format(classname) 96 | # assumes annotations are in annopath.format(imagename) 97 | # assumes imagesetfile is a text file with each line an image name 98 | # cachedir caches the annotations in a pickle file 99 | 100 | # first load gt 101 | if not os.path.isdir(cachedir): 102 | os.mkdir(cachedir) 103 | cachefile = os.path.join(cachedir, imagesetfile.split(".")[0]+'_annots.pkl') 104 | # read list of images 105 | with open(imagesetfile, 'r') as f: 106 | lines = f.readlines() 107 | imagenames = [x.strip() for x in lines] 108 | 109 | if not os.path.isfile(cachefile): 110 | # load annots 111 | recs = {} 112 | for i, imagename in enumerate(imagenames): 113 | recs[imagename] = parse_rec(annopath.format(imagename)) 114 | if i % 100 == 0: 115 | print('Reading annotation for {:d}/{:d}'.format( 116 | i + 1, len(imagenames))) 117 | # save 118 | print('Saving cached annotations to {:s}'.format(cachefile)) 119 | with open(cachefile, 'wb') as f: 120 | pickle.dump(recs, f) 121 | else: 122 | # load 123 | with open(cachefile, 'rb') as f: 124 | recs = pickle.load(f) 125 | 126 | # extract gt objects for this class 127 | class_recs = {} 128 | npos = 0 129 | for imagename in imagenames: 130 | R = [obj for obj in recs[imagename] if obj['name'] == classname] 131 | bbox = np.array([x['bbox'] for x in R]) 132 | difficult = np.array([x['difficult'] for x in R]).astype(np.bool) 133 | det = [False] * len(R) 134 | npos = npos + sum(~difficult) 135 | class_recs[imagename] = {'bbox': bbox, 136 | 'difficult': difficult, 137 | 'det': det} 138 | 139 | # read dets 140 | detfile = detpath.format(classname) 141 | with open(detfile, 'r') as f: 142 | lines = f.readlines() 143 | 144 | splitlines = [x.strip().split(' ') for x in lines] 145 | image_ids = [x[0] for x in splitlines] 146 | confidence = np.array([float(x[1]) for x in splitlines]) 147 | BB = np.array([[float(z) for z in x[2:]] for x in splitlines]) 148 | # sort by confidence 149 | sorted_ind = np.argsort(-confidence) 150 | sorted_scores = np.sort(-confidence) 151 | BB = BB[sorted_ind, :] 152 | image_ids = [image_ids[x] for x in sorted_ind] 153 | 154 | # go down dets and mark TPs and FPs 155 | nd = len(image_ids) 156 | tp = np.zeros(nd) 157 | fp = np.zeros(nd) 158 | for d in range(nd): 159 | R = class_recs[image_ids[d]] 160 | bb = BB[d, :].astype(float) 161 | ovmax = -np.inf 162 | BBGT = R['bbox'].astype(float) 163 | 164 | if BBGT.size > 0: 165 | # compute overlaps 166 | # intersection 167 | ixmin = np.maximum(BBGT[:, 0], bb[0]) 168 | iymin = np.maximum(BBGT[:, 1], bb[1]) 169 | ixmax = np.minimum(BBGT[:, 2], bb[2]) 170 | iymax = np.minimum(BBGT[:, 3], bb[3]) 171 | iw = np.maximum(ixmax - ixmin + 1., 0.) 172 | ih = np.maximum(iymax - iymin + 1., 0.) 173 | inters = iw * ih 174 | 175 | # union 176 | uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) + 177 | (BBGT[:, 2] - BBGT[:, 0] + 1.) * 178 | (BBGT[:, 3] - BBGT[:, 1] + 1.) - inters) 179 | 180 | overlaps = inters / uni 181 | ovmax = np.max(overlaps) 182 | jmax = np.argmax(overlaps) 183 | 184 | if ovmax > ovthresh: 185 | if not R['difficult'][jmax]: 186 | if not R['det'][jmax]: 187 | tp[d] = 1. 188 | R['det'][jmax] = 1 189 | else: 190 | fp[d] = 1. 191 | else: 192 | fp[d] = 1. 193 | 194 | # compute precision recall 195 | fp = np.cumsum(fp) 196 | tp = np.cumsum(tp) 197 | rec = tp / float(npos) 198 | # avoid divide by zero in case the first detection matches a difficult 199 | # ground truth 200 | prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps) 201 | ap = voc_ap(rec, prec, use_07_metric) 202 | 203 | return rec, prec, ap 204 | -------------------------------------------------------------------------------- /demo.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # Written by yq_yao 3 | 4 | from __future__ import division 5 | import time 6 | import torch 7 | import os 8 | os.environ["CUDA_VISIBLE_DEVICES"] = "1" 9 | import torch.nn as nn 10 | import torch.backends.cudnn as cudnn 11 | import numpy as np 12 | import cv2 13 | import argparse 14 | import os.path as osp 15 | import math 16 | from model.yolo import Yolov3 17 | from utils.box_utils import draw_rects, detection_postprecess 18 | from data.config import voc_config, coco_config 19 | from utils.preprocess import preproc_for_test 20 | 21 | def arg_parse(): 22 | """ 23 | Parse arguements to the detect module 24 | """ 25 | parser = argparse.ArgumentParser(description='YOLO v3 Detection Module') 26 | 27 | parser.add_argument("--images", dest = 'images', help = 28 | "Image / Directory containing images to perform detection upon",default = "images", type = str) 29 | parser.add_argument("--confidence", dest = "confidence", help = "Object Confidence to filter predictions", default = 0.1) 30 | parser.add_argument("--nms_thresh", dest = "nms_thresh", help = "NMS Threshhold", default = 0.4) 31 | parser.add_argument("--input_wh", dest = "input_wh", type=int, nargs=2, default = [416, 416]) 32 | parser.add_argument("--save_path", dest = "save_path", help = "coco name path", default = './output') 33 | parser.add_argument("--dataset", dest = "dataset", help = "VOC or COCO", default = 'VOC') 34 | parser.add_argument("--weights", dest = 'weights', 35 | help = "weightsfile", 36 | default = "./weights/convert_yolov3_coco.pth", type = str) 37 | parser.add_argument('--cuda', default=True, type=str, 38 | help='Use cuda to train model') 39 | parser.add_argument('--use_pad', default=True, type=str, 40 | help='Use pad to resize images') 41 | return parser.parse_args() 42 | 43 | 44 | if __name__ == '__main__': 45 | args = arg_parse() 46 | weightsfile = args.weights 47 | confidence = args.confidence 48 | nms_thresh = args.nms_thresh 49 | images = args.images 50 | input_wh = args.input_wh 51 | cuda = args.cuda 52 | use_pad = args.use_pad 53 | save_path = args.save_path 54 | dataset = args.dataset 55 | if dataset[0] == "V": 56 | cfg = voc_config 57 | elif dataset[1] == "C": 58 | cfg = coco_config 59 | else: 60 | print("only support VOC and COCO datasets !!!") 61 | name_path = cfg["name_path"] 62 | num_classes = cfg["num_classes"] 63 | anchors = cfg["anchors"] 64 | 65 | with open(name_path, "r") as f: 66 | classes = [i.strip() for i in f.readlines()] 67 | try: 68 | im_list = [osp.join(osp.realpath('.'), images, img) for img in os.listdir(images)] 69 | except NotADirectoryError: 70 | im_list = [] 71 | im_list.append(osp.join(osp.realpath('.'), images)) 72 | except FileNotFoundError: 73 | print ("No file or directory with the name {}".format(images)) 74 | exit() 75 | 76 | net = Yolov3("test", input_wh, anchors, cfg["anchors_mask"], num_classes) 77 | state_dict = torch.load(weightsfile) 78 | from collections import OrderedDict 79 | new_state_dict = OrderedDict() 80 | for k, v in state_dict.items(): 81 | head = k[:7] 82 | if head == 'module.': 83 | name = k[7:] # remove `module.` 84 | else: 85 | name = k 86 | new_state_dict[name] = v 87 | if cuda: 88 | net.cuda() 89 | cudnn.benchmark = True 90 | net.load_state_dict(new_state_dict) 91 | print("load weights successfully.....") 92 | net.eval() 93 | for img_path in im_list[:]: 94 | print(img_path) 95 | img = cv2.imread(img_path) 96 | ori_img = img.copy() 97 | ori_wh = (img.shape[1], img.shape[0]) 98 | img = preproc_for_test(img, input_wh, use_pad) 99 | if cuda: 100 | img = img.cuda() 101 | st = time.time() 102 | detection = net(img) 103 | detect_time = time.time() 104 | detection = detection_postprecess(detection, confidence, num_classes, input_wh, ori_wh, use_pad=use_pad, nms_conf=nms_thresh) 105 | nms_time = time.time() 106 | draw_img = draw_rects(ori_img, detection, classes) 107 | draw_time = time.time() 108 | save_img_path = os.path.join(save_path, "output_" + img_path.split("/")[-1]) 109 | cv2.imwrite(save_img_path, draw_img) 110 | final_time = time.time() - st 111 | 112 | print("detection time:", round(detect_time - st, 3), "nms_time:", round(nms_time - detect_time, 3), "draw_time:", round(draw_time - nms_time, 3), "final_time:", round(final_time ,3)) 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | -------------------------------------------------------------------------------- /eval.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # Written by yq_yao 3 | 4 | from __future__ import division 5 | import time 6 | import torch 7 | import os 8 | os.environ["CUDA_VISIBLE_DEVICES"] = "0" 9 | import torch.nn as nn 10 | from torch.autograd import Variable 11 | import torch.backends.cudnn as cudnn 12 | import numpy as np 13 | import cv2 14 | import argparse 15 | import os.path as osp 16 | import math 17 | import pickle 18 | from model.yolo import Yolov3 19 | from data.voc0712 import VOCDetection, detection_collate 20 | from data.coco import COCODetection 21 | from data.config import voc_config, coco_config, datasets_dict 22 | from utils.box_utils import draw_rects, detection_postprecess 23 | from utils.timer import Timer 24 | from utils.preprocess import preproc_for_test 25 | 26 | def arg_parse(): 27 | """ 28 | Parse arguements to the detect module 29 | 30 | """ 31 | parser = argparse.ArgumentParser(description='YOLO v3 Detection Module') 32 | 33 | parser.add_argument('--dataset', default='VOC', 34 | help='VOC ,VOC0712++ or COCO dataset') 35 | parser.add_argument("--nms_thresh", dest = "nms_thresh", help = "NMS Threshhold", default = 0.4) 36 | parser.add_argument("--input_wh", dest = "input_wh", type=int, nargs=2, default = [416, 416]) 37 | parser.add_argument("--weights", dest = 'weights', 38 | help = "weightsfile", 39 | default = "./weights/yolov3_COCO_epoches_10_0607.pth", type = str) 40 | parser.add_argument('--cuda', default=True, type=str, 41 | help='Use cuda to train model') 42 | parser.add_argument('--use_pad', default=True, type=str, 43 | help='Use pad to resize images') 44 | parser.add_argument('--retest', default=False, type=bool, 45 | help='test cache results') 46 | parser.add_argument('--save_folder', default='./eval/', 47 | help='results path') 48 | return parser.parse_args() 49 | 50 | def test_net(cfg, save_folder, input_wh, net, cuda, testset, 51 | max_per_image=300, thresh=0.05, nms_conf=0.4): 52 | """Test a Fast R-CNN network on an image database.""" 53 | num_images = len(testset) 54 | # all detections are collected into: 55 | # all_boxes[cls][image] = N x 5 array of detections in 56 | # (x1, y1, x2, y2, score) 57 | num_images = len(testset) 58 | num_classes = cfg["num_classes"] 59 | all_boxes = [[[] for _ in range(num_images)] 60 | for _ in range(num_classes)] 61 | 62 | if not os.path.exists(save_folder): 63 | os.mkdir(save_folder) 64 | # timers 65 | _t = {'im_detect': Timer(), 'misc': Timer()} 66 | det_file = os.path.join(save_folder, 'detections.pkl') 67 | 68 | if args.retest: 69 | f = open(det_file,'rb') 70 | all_boxes = pickle.load(f) 71 | print('Evaluating detections') 72 | testset.evaluate_detections(all_boxes, save_folder) 73 | return 74 | 75 | for i in range(num_images): 76 | img, img_id = testset.pull_image(i) 77 | ori_wh = (img.shape[1], img.shape[0]) 78 | img = preproc_for_test(img, input_wh, use_pad) 79 | x = img 80 | if cuda: 81 | x = x.cuda() 82 | 83 | _t['im_detect'].tic() 84 | out = net(x) # forward pass 85 | detections = detection_postprecess(out, thresh, num_classes, input_wh, ori_wh, use_pad=use_pad, nms_conf=nms_conf) 86 | boxes, scores, cls_inds = detections[:, :4], detections[:,4], detections[:, -1] 87 | detect_time = _t['im_detect'].toc() 88 | if len(boxes) == 0: 89 | continue 90 | 91 | _t['misc'].tic() 92 | for j in range(num_classes): 93 | inds = np.where(cls_inds == j)[0] 94 | if len(inds) == 0: 95 | all_boxes[j][i] = np.empty([0, 5], dtype=np.float32) 96 | continue 97 | c_bboxes = boxes[inds] 98 | c_scores = scores[inds] 99 | c_dets = np.hstack((c_bboxes, c_scores[:, np.newaxis])).astype( 100 | np.float32, copy=False) 101 | all_boxes[j][i] = c_dets 102 | 103 | if max_per_image > 0: 104 | image_scores = np.hstack([all_boxes[j][i][:, -1] for j in range(num_classes)]) 105 | if len(image_scores) > max_per_image: 106 | image_thresh = np.sort(image_scores)[-max_per_image] 107 | for j in range(num_classes): 108 | keep = np.where(all_boxes[j][i][:, -1] >= image_thresh)[0] 109 | all_boxes[j][i] = all_boxes[j][i][keep, :] 110 | nms_time = _t['misc'].toc() 111 | 112 | if i % 20 == 0: 113 | print('im_detect: {:d}/{:d} {:.3f}s {:.3f}s' 114 | .format(i + 1, num_images, detect_time, nms_time)) 115 | _t['im_detect'].clear() 116 | _t['misc'].clear() 117 | 118 | with open(det_file, 'wb') as f: 119 | pickle.dump(all_boxes, f, pickle.HIGHEST_PROTOCOL) 120 | print('Evaluating detections') 121 | testset.evaluate_detections(all_boxes, save_folder) 122 | 123 | if __name__ == '__main__': 124 | args = arg_parse() 125 | weightsfile = args.weights 126 | nms_thresh = args.nms_thresh 127 | input_wh = args.input_wh 128 | cuda = args.cuda 129 | use_pad = args.use_pad 130 | save_folder = args.save_folder 131 | dataset = args.dataset 132 | if dataset[0] == "V": 133 | cfg = voc_config 134 | test_dataset = VOCDetection(cfg["root"], datasets_dict["VOC2007"], input_wh) 135 | elif dataset[0] == "C": 136 | cfg = coco_config 137 | test_dataset = COCODetection(cfg["root"], datasets_dict["COCOval"], input_wh) 138 | else: 139 | print("only support VOC and COCO datasets !!!") 140 | 141 | print("load test_dataset successfully.....") 142 | 143 | with open(cfg["name_path"], "r") as f: 144 | classes = [i.strip() for i in f.readlines()] 145 | 146 | net = Yolov3("test", input_wh, cfg["anchors"], cfg["anchors_mask"], cfg["num_classes"]) 147 | state_dict = torch.load(weightsfile) 148 | from collections import OrderedDict 149 | new_state_dict = OrderedDict() 150 | for k, v in state_dict.items(): 151 | head = k[:7] 152 | if head == 'module.': 153 | name = k[7:] # remove `module.` 154 | else: 155 | name = k 156 | new_state_dict[name] = v 157 | 158 | if cuda: 159 | net.cuda() 160 | cudnn.benchmark = True 161 | net.load_state_dict(new_state_dict) 162 | print("load weights successfully.....") 163 | net.eval() 164 | 165 | top_k = 200 166 | confidence = 0.01 167 | test_net(cfg, save_folder, input_wh, net, args.cuda, test_dataset, top_k, confidence, nms_thresh) 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | -------------------------------------------------------------------------------- /images/dog.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/images/dog.jpg -------------------------------------------------------------------------------- /images/eagle.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/images/eagle.jpg -------------------------------------------------------------------------------- /images/person.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/images/person.jpg -------------------------------------------------------------------------------- /layers/multiyolo_loss.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # Written by yq_yao 3 | # 4 | import torch 5 | import torch.nn as nn 6 | import torch.nn.functional as F 7 | from torch.autograd import Variable 8 | import numpy as np 9 | from .weight_mseloss import WeightMseLoss 10 | from utils.box_utils import targets_match_all, permute_sigmoid, decode 11 | 12 | class MultiYoloLoss(nn.Module): 13 | 14 | def __init__(self, input_wh, num_classes, ignore_thresh, anchors, anchors_mask, use_gpu=True): 15 | super(MultiYoloLoss, self).__init__() 16 | self.num_classes = num_classes 17 | self.ignore_thresh = ignore_thresh 18 | self.use_gpu = use_gpu 19 | self.anchors = anchors 20 | self.mse_loss = nn.MSELoss(size_average=False) 21 | self.bce_loss = nn.BCELoss(size_average=False) 22 | self.weight_mseloss = WeightMseLoss(size_average=False) 23 | self.input_wh = input_wh 24 | self.anchors_mask = anchors_mask 25 | 26 | def forward(self, x, targets, input_wh, debug=False): 27 | self.input_wh = input_wh 28 | l_data, m_data, h_data = x 29 | l_grid_wh = (l_data.size(3), l_data.size(2)) 30 | m_grid_wh = (m_data.size(3), m_data.size(2)) 31 | h_grid_wh = (h_data.size(3), h_data.size(2)) 32 | feature_dim = (l_grid_wh, m_grid_wh, h_grid_wh) 33 | batch_size = l_data.size(0) 34 | pred_l, stride_l = permute_sigmoid(l_data, self.input_wh, 3, self.num_classes) 35 | pred_m, stride_m = permute_sigmoid(m_data, self.input_wh, 3, self.num_classes) 36 | pred_h, stride_h = permute_sigmoid(h_data, self.input_wh, 3, self.num_classes) 37 | pred = torch.cat((pred_l, pred_m, pred_h), 1) 38 | 39 | anchors1 = self.anchors[self.anchors_mask[0][0]: self.anchors_mask[0][-1]+1] 40 | anchors2 = self.anchors[self.anchors_mask[1][0]: self.anchors_mask[1][-1]+1] 41 | anchors3 = self.anchors[self.anchors_mask[2][0]: self.anchors_mask[2][-1]+1] 42 | 43 | decode_l = decode(pred_l.new_tensor(pred_l).detach(), self.input_wh, anchors1, self.num_classes, stride_l) 44 | decode_m = decode(pred_m.new_tensor(pred_m).detach(), self.input_wh, anchors2, self.num_classes, stride_m) 45 | decode_h = decode(pred_h.new_tensor(pred_h).detach(), self.input_wh, anchors3, self.num_classes, stride_h) 46 | decode_pred = torch.cat((decode_l, decode_m, decode_h), 1) 47 | 48 | num_pred = pred_l.size(1) + pred_m.size(1) + pred_h.size(1) 49 | 50 | # prediction targets x,y,w,h,objectness, class 51 | pred_t = torch.Tensor(batch_size, num_pred, 6).cuda() 52 | # xywh scale, scale = 2 - truth.w * truth.h (if truth is normlized to 1) 53 | scale_t = torch.FloatTensor(batch_size, num_pred).cuda() 54 | # foreground targets mask 55 | fore_mask_t = torch.ByteTensor(batch_size, num_pred).cuda() 56 | 57 | # background targets mask, we only calculate the objectness pred loss 58 | back_mask_t = torch.ByteTensor(batch_size, num_pred).cuda() 59 | 60 | for idx in range(batch_size): 61 | # match all targets 62 | targets_match_all(self.input_wh, self.ignore_thresh, targets[idx], decode_pred[idx][:, :4], self.anchors, feature_dim, pred_t, scale_t, fore_mask_t, back_mask_t, num_pred, idx) 63 | 64 | scale_factor = scale_t[fore_mask_t].view(-1, 1) 65 | scale_factor = scale_factor.expand((scale_factor.size(0), 2)) 66 | cls_t = pred_t[..., 5][fore_mask_t].long().view(-1, 1) 67 | cls_pred = pred[..., 5:] 68 | 69 | # cls loss 70 | cls_fore_mask_t = fore_mask_t.new_tensor(fore_mask_t).view(batch_size, num_pred, 1).expand_as(cls_pred) 71 | cls_pred = cls_pred[cls_fore_mask_t].view(-1, self.num_classes) 72 | class_mask = cls_pred.data.new(cls_t.size(0), self.num_classes).fill_(0) 73 | class_mask.scatter_(1, cls_t, 1.) 74 | cls_loss = self.bce_loss(cls_pred, class_mask) 75 | ave_cls = (class_mask * cls_pred).sum().item() / cls_pred.size(0) 76 | 77 | # conf loss 78 | conf_t = pred_t[..., 4] 79 | fore_conf_t = conf_t[fore_mask_t].view(-1, 1) 80 | back_conf_t = conf_t[back_mask_t].view(-1, 1) 81 | fore_conf_pred = pred[..., 4][fore_mask_t].view(-1, 1) 82 | back_conf_pred = pred[..., 4][back_mask_t].view(-1, 1) 83 | fore_num = fore_conf_pred.size(0) 84 | back_num = back_conf_pred.size(0) 85 | Obj = fore_conf_pred.sum().item() / fore_num 86 | no_obj = back_conf_pred.sum().item() / back_num 87 | 88 | fore_conf_loss = self.bce_loss(fore_conf_pred, fore_conf_t) 89 | back_conf_loss = self.bce_loss(back_conf_pred, back_conf_t) 90 | conf_loss = fore_conf_loss + back_conf_loss 91 | 92 | # loc loss 93 | loc_pred = pred[..., :4] 94 | loc_t = pred_t[..., :4] 95 | fore_mask_t = fore_mask_t.view(batch_size, num_pred, 1).expand_as(loc_pred) 96 | loc_t = loc_t[fore_mask_t].view(-1, 4) 97 | loc_pred = loc_pred[fore_mask_t].view(-1, 4) 98 | 99 | xy_t, wh_t = loc_t[:, :2], loc_t[:, 2:] 100 | xy_pred, wh_pred = loc_pred[:, :2], loc_pred[:, 2:] 101 | # xy_loss = F.binary_cross_entropy(xy_pred, xy_t, scale_factor, size_average=False) 102 | 103 | xy_loss = self.weight_mseloss(xy_pred, xy_t, scale_factor) / 2 104 | wh_loss = self.weight_mseloss(wh_pred, wh_t, scale_factor) / 2 105 | 106 | loc_loss = xy_loss + wh_loss 107 | 108 | loc_loss /= batch_size 109 | conf_loss /= batch_size 110 | cls_loss /= batch_size 111 | 112 | if debug: 113 | print("xy_loss", round(xy_loss.item(), 5), "wh_loss", round(wh_loss.item(), 5), "cls_loss", round(cls_loss.item(), 5), "ave_cls", round(ave_cls, 5), "Obj", round(Obj, 5), "no_obj", round(no_obj, 5), "fore_conf_loss", round(fore_conf_loss.item(), 5), 114 | "back_conf_loss", round(back_conf_loss.item(), 5)) 115 | 116 | loss = loc_loss + conf_loss + cls_loss 117 | 118 | return loss 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | -------------------------------------------------------------------------------- /layers/weight_mseloss.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # Written by yq_yao 3 | 4 | 5 | import torch 6 | import torch.nn as nn 7 | import torch.nn.functional as F 8 | from torch.autograd import Variable 9 | 10 | 11 | class WeightMseLoss(nn.Module): 12 | def __init__(self, size_average=True): 13 | super(WeightMseLoss, self).__init__() 14 | self.size_average = size_average 15 | 16 | def forward(self, inputs, targets, weights): 17 | ''' inputs is N * C 18 | targets is N * C 19 | weights is N * C 20 | ''' 21 | N = inputs.size(0) 22 | C = inputs.size(1) 23 | 24 | out = (targets - inputs) 25 | out = weights * torch.pow(out, 2) 26 | loss = out.sum() 27 | 28 | if self.size_average: 29 | loss = loss / (N * C) 30 | return loss 31 | -------------------------------------------------------------------------------- /layers/yolo_layer.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # Written by yq_yao 3 | # 4 | import torch 5 | import torch.nn as nn 6 | import torch.nn.functional as F 7 | from torch.autograd import Variable 8 | import numpy as np 9 | import math 10 | from .weight_mseloss import WeightMseLoss 11 | from utils.box_utils import targets_match_single, permute_sigmoid, decode 12 | 13 | class YoloLayer(nn.Module): 14 | 15 | def __init__(self, input_wh, num_classes, ignore_thresh, anchors, anchors_mask,use_gpu=True): 16 | super(YoloLayer, self).__init__() 17 | self.num_classes = num_classes 18 | self.ignore_thresh = ignore_thresh 19 | self.use_gpu = use_gpu 20 | self.anchors = anchors 21 | self.anchors_mask = anchors_mask 22 | self.input_wh = input_wh 23 | self.mse_loss = nn.MSELoss(size_average=False) 24 | self.bce_loss = nn.BCELoss(size_average=False) 25 | self.weight_mseloss = WeightMseLoss(size_average=False) 26 | 27 | def forward(self, x, targets, input_wh, debug=False): 28 | self.input_wh = input_wh 29 | batch_size = x.size(0) 30 | # feature map size w, h, this produce wxh cells to predict 31 | grid_wh = (x.size(3), x.size(2)) 32 | x, stride = permute_sigmoid(x, input_wh, 3, self.num_classes) 33 | pred = x 34 | num_pred = pred.size(1) 35 | 36 | decode_pred = decode(pred.new_tensor(pred).detach(), self.input_wh, self.anchors[self.anchors_mask[0]: self.anchors_mask[-1]+1], self.num_classes, stride) 37 | 38 | # prediction targets x,y,w,h,objectness, class 39 | pred_t = torch.Tensor(batch_size, num_pred, 6).cuda() 40 | # xywh scale, scale = 2 - truth.w * truth.h (if truth is normlized to 1) 41 | scale_t = torch.FloatTensor(batch_size, num_pred).cuda() 42 | # foreground targets mask 43 | fore_mask_t = torch.ByteTensor(batch_size, num_pred).cuda() 44 | 45 | # background targets mask, we only calculate the objectness pred loss 46 | back_mask_t = torch.ByteTensor(batch_size, num_pred).cuda() 47 | 48 | for idx in range(batch_size): 49 | # match our targets 50 | targets_match_single(self.input_wh, self.ignore_thresh, targets[idx], decode_pred[idx][:, :4], self.anchors, self.anchors_mask, pred_t, scale_t, fore_mask_t, back_mask_t, grid_wh, idx) 51 | 52 | cls_t = pred_t[..., 5][fore_mask_t].long().view(-1, 1) 53 | cls_pred = pred[..., 5:] 54 | conf_t = pred_t[..., 4] 55 | if cls_t.size(0) == 0: 56 | print("grid_wh {} no matching anchors".format(grid_wh)) 57 | back_conf_t = conf_t[back_mask_t].view(-1, 1) 58 | back_conf_pred = pred[..., 4][back_mask_t].view(-1, 1) 59 | back_num = back_conf_pred.size(0) 60 | no_obj = back_conf_pred.sum().item() / back_num 61 | back_conf_loss = self.bce_loss(back_conf_pred, back_conf_t) 62 | if debug: 63 | print("grid_wh", grid_wh, "loc_loss", 0, "conf_loss", round(back_conf_loss.item(), 5), "cls_loss", 0, "Obj", 0, "no_obj", round(no_obj, 5)) 64 | return torch.zeros(1), back_conf_loss, torch.zeros(1) 65 | 66 | scale_factor = scale_t[fore_mask_t].view(-1, 1) 67 | scale_factor = scale_factor.expand((scale_factor.size(0), 2)) 68 | 69 | # cls loss 70 | cls_fore_mask_t = fore_mask_t.new_tensor(fore_mask_t).view(batch_size, num_pred, 1).expand_as(cls_pred) 71 | cls_pred = cls_pred[cls_fore_mask_t].view(-1, self.num_classes) 72 | class_mask = cls_pred.data.new(cls_t.size(0), self.num_classes).fill_(0) 73 | class_mask.scatter_(1, cls_t, 1.) 74 | cls_loss = self.bce_loss(cls_pred, class_mask) 75 | ave_cls = (class_mask * cls_pred).sum().item() / cls_pred.size(0) 76 | 77 | # conf loss 78 | fore_conf_t = conf_t[fore_mask_t].view(-1, 1) 79 | back_conf_t = conf_t[back_mask_t].view(-1, 1) 80 | fore_conf_pred = pred[..., 4][fore_mask_t].view(-1, 1) 81 | back_conf_pred = pred[..., 4][back_mask_t].view(-1, 1) 82 | fore_num = fore_conf_pred.size(0) 83 | back_num = back_conf_pred.size(0) 84 | Obj = fore_conf_pred.sum().item() / fore_num 85 | no_obj = back_conf_pred.sum().item() / back_num 86 | 87 | fore_conf_loss = self.bce_loss(fore_conf_pred, fore_conf_t) 88 | back_conf_loss = self.bce_loss(back_conf_pred, back_conf_t) 89 | conf_loss = fore_conf_loss + back_conf_loss 90 | 91 | # loc loss 92 | loc_pred = pred[..., :4] 93 | loc_t = pred_t[..., :4] 94 | fore_mask_t = fore_mask_t.view(batch_size, num_pred, 1).expand_as(loc_pred) 95 | loc_t = loc_t[fore_mask_t].view(-1, 4) 96 | loc_pred = loc_pred[fore_mask_t].view(-1, 4) 97 | 98 | xy_t, wh_t = loc_t[:, :2], loc_t[:, 2:] 99 | xy_pred, wh_pred = loc_pred[:, :2], loc_pred[:, 2:] 100 | # xy_loss = F.binary_cross_entropy(xy_pred, xy_t, scale_factor, size_average=False) 101 | 102 | xy_loss = self.weight_mseloss(xy_pred, xy_t, scale_factor) / 2 103 | wh_loss = self.weight_mseloss(wh_pred, wh_t, scale_factor) / 2 104 | 105 | loc_loss = xy_loss + wh_loss 106 | 107 | loc_loss /= batch_size 108 | conf_loss /= batch_size 109 | cls_loss /= batch_size 110 | 111 | if debug: 112 | print("grid_wh", grid_wh, "xy_loss", round(xy_loss.item(), 5), "wh_loss", round(wh_loss.item(), 5), "cls_loss", round(cls_loss.item(), 5), "ave_cls", round(ave_cls, 5), "Obj", round(Obj, 5), "no_obj", round(no_obj, 5), "fore_conf_loss", round(fore_conf_loss.item(), 5), 113 | "back_conf_loss", round(back_conf_loss.item(), 5)) 114 | 115 | return loc_loss, conf_loss, cls_loss 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | -------------------------------------------------------------------------------- /layers/yolo_loss.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # Written by yq_yao 3 | # 4 | import torch 5 | import torch.nn as nn 6 | import torch.nn.functional as F 7 | from torch.autograd import Variable 8 | import torch.nn.init as init 9 | import os 10 | from layers.yolo_layer import YoloLayer 11 | 12 | 13 | class YoloLoss(nn.Module): 14 | def __init__(self, input_wh, num_classes, ignore_thresh, anchors, anchors_mask, use_gpu=True): 15 | super(YoloLoss, self).__init__() 16 | self.input_wh = input_wh 17 | self.num_classes = num_classes 18 | self.ignore_thresh = ignore_thresh 19 | self.use_gpu = use_gpu 20 | self.anchors = anchors 21 | self.anchors_mask = anchors_mask 22 | self.yolo_layer1 = YoloLayer(input_wh, num_classes, ignore_thresh, anchors, anchors_mask[0]) 23 | self.yolo_layer2 = YoloLayer(input_wh, num_classes, ignore_thresh, anchors, anchors_mask[1]) 24 | self.yolo_layer3 = YoloLayer(input_wh, num_classes, ignore_thresh, anchors, anchors_mask[2]) 25 | 26 | def forward(self, inputs, targets, input_wh, debug): 27 | self.input_wh = input_wh 28 | x, y, z = inputs 29 | batch_size = x.size(0) 30 | loc_loss1, conf_loss1, cls_loss1 = self.yolo_layer1(x, targets, self.input_wh, debug) 31 | loc_loss2, conf_loss2, cls_loss2 = self.yolo_layer2(y, targets, self.input_wh, debug) 32 | loc_loss3, conf_loss3, cls_loss3 = self.yolo_layer3(z, targets, self.input_wh, debug) 33 | loc_loss = loc_loss1 + loc_loss2 + loc_loss3 34 | conf_loss = conf_loss1 + conf_loss2 + conf_loss3 35 | cls_loss = cls_loss1 + cls_loss2 + cls_loss3 36 | loss = loc_loss + conf_loss + cls_loss 37 | return loss -------------------------------------------------------------------------------- /make.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | cd ./utils/ 3 | 4 | CUDA_PATH=/usr/local/cuda/ 5 | 6 | python build.py build_ext --inplace 7 | # if you use anaconda3 maybe you need add this 8 | mv nms/cpu_nms.cpython-36m-x86_64-linux-gnu.so nms/cpu_nms.so 9 | mv nms/gpu_nms.cpython-36m-x86_64-linux-gnu.so nms/gpu_nms.so 10 | cd .. 11 | -------------------------------------------------------------------------------- /model/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/model/__init__.py -------------------------------------------------------------------------------- /model/darknet53.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # Written by yq_yao 3 | # 4 | import torch 5 | import torch.nn as nn 6 | import torch.nn.functional as F 7 | from torch.autograd import Variable 8 | 9 | class ConvBN(nn.Module): 10 | def __init__(self, ch_in, ch_out, kernel_size=3, stride=1, padding=0): 11 | super().__init__() 12 | self.conv = nn.Conv2d(ch_in, ch_out, kernel_size=kernel_size, stride=stride, padding=padding, bias=False) 13 | self.bn = nn.BatchNorm2d(ch_out, momentum=0.01, eps=1e-05, affine=True) 14 | 15 | def forward(self, x): 16 | return F.leaky_relu(self.bn(self.conv(x)), negative_slope=0.1, inplace=True) 17 | 18 | class DarknetBlock(nn.Module): 19 | def __init__(self, ch_in): 20 | super().__init__() 21 | ch_hid = ch_in // 2 22 | self.conv1 = ConvBN(ch_in, ch_hid, kernel_size=1, stride=1, padding=0) 23 | self.conv2 = ConvBN(ch_hid, ch_in, kernel_size=3, stride=1, padding=1) 24 | 25 | def forward(self, x): 26 | out = self.conv1(x) 27 | out = self.conv2(out) 28 | return out + x 29 | 30 | class Darknet19(nn.Module): 31 | def __init__(self, size): 32 | super().__init__() 33 | self.conv = ConvBN(3, 32, kernel_size=3, stride=1, padding=1) 34 | self.layer1 = self._make_layer1() 35 | self.layer2 = self._make_layer2() 36 | self.layer3 = self._make_layer3() 37 | self.layer4 = self._make_layer4() 38 | self.layer5 = self._make_layer5() 39 | 40 | def _make_layer1(self): 41 | layers = [nn.MaxPool2d(kernel_size=2, stride=2), 42 | ConvBN(32, 64, kernel_size=3, stride=1, padding=1)] 43 | return nn.Sequential(*layers) 44 | 45 | def _make_layer2(self): 46 | layers = [nn.MaxPool2d(kernel_size=2, stride=2), 47 | ConvBN(64, 128, kernel_size=3, stride=1, padding=1), 48 | ConvBN(128, 64, kernel_size=1, stride=1, padding=1), 49 | ConvBN(64, 128, kernel_size=3, stride=1, padding=1)] 50 | return nn.Sequential(*layers) 51 | 52 | def _make_layer3(self): 53 | layers = [nn.MaxPool2d(kernel_size=2, stride=2), 54 | ConvBN(128, 256, kernel_size=3, stride=1, padding=1), 55 | ConvBN(256, 128, kernel_size=1, stride=1, padding=1), 56 | ConvBN(128, 256, kernel_size=3, stride=1, padding=1)] 57 | return nn.Sequential(*layers) 58 | 59 | def _make_layer4(self): 60 | layers = [nn.MaxPool2d(kernel_size=2, stride=2), 61 | ConvBN(256, 512, kernel_size=3, stride=1, padding=1), 62 | ConvBN(512, 256, kernel_size=1, stride=1, padding=1), 63 | ConvBN(256, 512, kernel_size=3, stride=1, padding=1), 64 | ConvBN(512, 256, kernel_size=1, stride=1, padding=1), 65 | ConvBN(256, 512, kernel_size=3, stride=1, padding=1)] 66 | return nn.Sequential(*layers) 67 | 68 | def _make_layer5(self): 69 | layers = [nn.MaxPool2d(kernel_size=2, stride=2), 70 | ConvBN(512, 1024, kernel_size=3, stride=1, padding=1), 71 | ConvBN(1024, 512, kernel_size=1, stride=1, padding=1), 72 | ConvBN(512, 1024, kernel_size=3, stride=1, padding=1), 73 | ConvBN(1024, 512, kernel_size=1, stride=1, padding=1), 74 | ConvBN(512, 1024, kernel_size=3, stride=1, padding=1)] 75 | return nn.Sequential(*layers) 76 | 77 | def forward(self, x): 78 | out = self.conv(x) 79 | 80 | c1 = self.layer1(out) 81 | c2 = self.layer2(c1) 82 | c3 = self.layer3(c2) 83 | c4 = self.layer4(c3) 84 | c5 = self.layer5(c4) 85 | return (c3, c4, c5) 86 | 87 | 88 | class Darknet53(nn.Module): 89 | def __init__(self, num_blocks): 90 | super().__init__() 91 | self.conv = ConvBN(3, 32, kernel_size=3, stride=1, padding=1) 92 | self.layer1 = self._make_layer(32, num_blocks[0], stride=2) 93 | self.layer2 = self._make_layer(64, num_blocks[1], stride=2) 94 | self.layer3 = self._make_layer(128, num_blocks[2], stride=2) 95 | self.layer4 = self._make_layer(256, num_blocks[3], stride=2) 96 | self.layer5 = self._make_layer(512, num_blocks[4], stride=2) 97 | 98 | def _make_layer(self, ch_in, num_blocks, stride=1): 99 | layers = [ConvBN(ch_in, ch_in*2, stride=stride, padding=1)] 100 | for i in range(num_blocks): 101 | layers.append(DarknetBlock(ch_in * 2)) 102 | return nn.Sequential(*layers) 103 | 104 | def forward(self, x): 105 | out = self.conv(x) 106 | c1 = self.layer1(out) 107 | c2 = self.layer2(c1) 108 | c3 = self.layer3(c2) 109 | c4 = self.layer4(c3) 110 | c5 = self.layer5(c4) 111 | return (c3, c4, c5) 112 | 113 | -------------------------------------------------------------------------------- /model/yolo.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # Written by yq_yao 3 | # 4 | import torch 5 | import torch.nn as nn 6 | import torch.nn.functional as F 7 | from torch.autograd import Variable 8 | import torch.nn.init as init 9 | from model.darknet53 import Darknet53 10 | import os 11 | from utils.box_utils import permute_sigmoid, decode 12 | from layers.yolo_layer import YoloLayer 13 | 14 | def xavier(param): 15 | init.xavier_uniform(param) 16 | 17 | # kaiming_weights_init 18 | def weights_init(m): 19 | for key in m.state_dict(): 20 | if key.split('.')[-1] == 'weight': 21 | if 'conv' in key: 22 | init.kaiming_normal_(m.state_dict()[key], mode='fan_out') 23 | if 'bn' in key: 24 | m.state_dict()[key][...] = 1 25 | elif key.split('.')[-1] == 'bias': 26 | m.state_dict()[key][...] = 0 27 | 28 | 29 | # def weights_init(m): 30 | # for key in m.state_dict(): 31 | # if key.split('.')[-1] == 'weight': 32 | # if 'conv' in key: 33 | # init.xavier_uniform(m.state_dict()[key]) 34 | # if 'bn' in key: 35 | # m.state_dict()[key][...] = 1 36 | # elif key.split('.')[-1] == 'bias': 37 | # m.state_dict()[key][...] = 0 38 | 39 | class ConvBN(nn.Module): 40 | def __init__(self, ch_in, ch_out, kernel_size=3, stride=1, padding=0): 41 | super().__init__() 42 | self.conv = nn.Conv2d(ch_in, ch_out, kernel_size=kernel_size, stride=stride, padding=padding, bias=False) 43 | self.bn = nn.BatchNorm2d(ch_out, momentum=0.01) 44 | 45 | def forward(self, x): 46 | return F.leaky_relu(self.bn(self.conv(x)), negative_slope=0.1, inplace=True) 47 | 48 | class DetectionLayer(nn.Module): 49 | def __init__(self, anchors, anchors_mask, input_wh, num_classes): 50 | super(DetectionLayer, self).__init__() 51 | self.anchors = anchors 52 | self.input_wh = input_wh 53 | self.anchors_mask = anchors_mask 54 | self.num_classes = num_classes 55 | 56 | def forward(self, x): 57 | l_data, m_data, h_data = x 58 | l_grid_wh = (l_data.size(3), l_data.size(2)) 59 | m_grid_wh = (m_data.size(3), m_data.size(2)) 60 | h_grid_wh = (h_data.size(3), h_data.size(2)) 61 | 62 | pred_l, stride_l = permute_sigmoid(l_data, self.input_wh, 3, self.num_classes) 63 | pred_m, stride_m = permute_sigmoid(m_data, self.input_wh, 3, self.num_classes) 64 | pred_h, stride_h = permute_sigmoid(h_data, self.input_wh, 3, self.num_classes) 65 | 66 | anchors1 = self.anchors[self.anchors_mask[0][0]: self.anchors_mask[0][-1]+1] 67 | anchors2 = self.anchors[self.anchors_mask[1][0]: self.anchors_mask[1][-1]+1] 68 | anchors3 = self.anchors[self.anchors_mask[2][0]: self.anchors_mask[2][-1]+1] 69 | 70 | decode_l = decode(pred_l.detach(), self.input_wh, anchors1, self.num_classes, stride_l) 71 | decode_m = decode(pred_m.detach(), self.input_wh, anchors2, self.num_classes, stride_m) 72 | decode_h = decode(pred_h.detach(), self.input_wh, anchors3, self.num_classes, stride_h) 73 | decode_pred = torch.cat((decode_l, decode_m, decode_h), 1) 74 | 75 | return decode_pred 76 | 77 | def predict_conv_list1(num_classes): 78 | layers = list() 79 | layers += [ConvBN(1024, 512, kernel_size=1, stride=1, padding=0)] 80 | layers += [ConvBN(512, 1024, kernel_size=3, stride=1, padding=1)] 81 | layers += [ConvBN(1024, 512, kernel_size=1, stride=1, padding=0)] 82 | layers += [ConvBN(512, 1024, kernel_size=3, stride=1, padding=1)] 83 | layers += [ConvBN(1024, 512, kernel_size=1, stride=1, padding=0)] 84 | layers += [ConvBN(512, 1024, kernel_size=3, stride=1, padding=1)] 85 | layers += [nn.Conv2d(1024, (5 + num_classes) * 3, kernel_size=1, stride=1, padding=0)] 86 | return layers 87 | 88 | def predict_conv_list2(num_classes): 89 | layers = list() 90 | layers += [ConvBN(768, 256, kernel_size=1, stride=1, padding=0)] 91 | layers += [ConvBN(256, 512, kernel_size=3, stride=1, padding=1)] 92 | layers += [ConvBN(512, 256, kernel_size=1, stride=1, padding=0)] 93 | layers += [ConvBN(256, 512, kernel_size=3, stride=1, padding=1)] 94 | layers += [ConvBN(512, 256, kernel_size=1, stride=1, padding=0)] 95 | layers += [ConvBN(256, 512, kernel_size=3, stride=1, padding=1)] 96 | layers += [nn.Conv2d(512, (5 + num_classes) * 3, kernel_size=1, stride=1, padding=0)] 97 | return layers 98 | 99 | def predict_conv_list3(num_classes): 100 | layers = list() 101 | layers += [ConvBN(384, 128, kernel_size=1, stride=1, padding=0)] 102 | layers += [ConvBN(128, 256, kernel_size=3, stride=1, padding=1)] 103 | layers += [ConvBN(256, 128, kernel_size=1, stride=1, padding=0)] 104 | layers += [ConvBN(128, 256, kernel_size=3, stride=1, padding=1)] 105 | layers += [ConvBN(256, 128, kernel_size=1, stride=1, padding=0)] 106 | layers += [ConvBN(128, 256, kernel_size=3, stride=1, padding=1)] 107 | layers += [nn.Conv2d(256, (5 + num_classes) * 3, kernel_size=1, stride=1, padding=0)] 108 | return layers 109 | 110 | class YOLOv3(nn.Module): 111 | def __init__(self, phase, num_blocks, anchors, anchors_mask, input_wh, num_classes): 112 | super().__init__() 113 | self.phase = phase 114 | self.extractor = Darknet53(num_blocks) 115 | self.predict_conv_list1 = nn.ModuleList(predict_conv_list1(num_classes)) 116 | self.smooth_conv1 = ConvBN(512, 256, kernel_size=1, stride=1, padding=0) 117 | self.predict_conv_list2 = nn.ModuleList(predict_conv_list2(num_classes)) 118 | self.smooth_conv2 = ConvBN(256, 128, kernel_size=1, stride=1, padding=0) 119 | self.predict_conv_list3 = nn.ModuleList(predict_conv_list3(num_classes)) 120 | if phase == "test": 121 | self.detection = DetectionLayer(anchors, anchors_mask, input_wh, num_classes) 122 | 123 | def forward(self, x, targets=None): 124 | c3, c4, c5 = self.extractor(x) 125 | x = c5 126 | # predict_list1 127 | for i in range(5): 128 | x = self.predict_conv_list1[i](x) 129 | smt1 = self.smooth_conv1(x) 130 | smt1 = F.upsample(smt1, scale_factor=2, mode='nearest') 131 | 132 | smt1 = torch.cat((smt1, c4), 1) 133 | for i in range(5, 7): 134 | x = self.predict_conv_list1[i](x) 135 | out1 = x 136 | 137 | x = smt1 138 | for i in range(5): 139 | x = self.predict_conv_list2[i](x) 140 | smt2 = self.smooth_conv2(x) 141 | smt2 = F.upsample(smt2, scale_factor=2, mode='nearest') 142 | smt2 = torch.cat((smt2, c3), 1) 143 | for i in range(5, 7): 144 | x = self.predict_conv_list2[i](x) 145 | out2 = x 146 | x = smt2 147 | for i in range(7): 148 | x = self.predict_conv_list3[i](x) 149 | out3 = x 150 | 151 | if self.phase == "test": 152 | detections = self.detection((out1, out2, out3)) 153 | return detections 154 | elif self.phase == "train": 155 | detections = (out1, out2, out3) 156 | return detections 157 | 158 | def load_weights(self, base_file): 159 | other, ext = os.path.splitext(base_file) 160 | if ext == '.pkl' or '.pth': 161 | print('Loading weights into state dict...') 162 | self.extractor.load_state_dict(torch.load(base_file)) 163 | print("initing darknet53 ......") 164 | self.predict_conv_list1.apply(weights_init) 165 | self.smooth_conv1.apply(weights_init) 166 | self.predict_conv_list2.apply(weights_init) 167 | self.smooth_conv2.apply(weights_init) 168 | self.predict_conv_list3.apply(weights_init) 169 | print('Finished!') 170 | else: 171 | print('Sorry only .pth and .pkl files supported.') 172 | 173 | def Yolov3(phase, input_wh, anchors, anchors_mask, num_classes): 174 | num_blocks = [1,2,8,8,4] 175 | return YOLOv3(phase, num_blocks, anchors, anchors_mask, input_wh, num_classes) 176 | -------------------------------------------------------------------------------- /output/output_dog.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/output/output_dog.jpg -------------------------------------------------------------------------------- /output/output_eagle.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/output/output_eagle.jpg -------------------------------------------------------------------------------- /output/output_person.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/output/output_person.jpg -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # Written by yq_yao 3 | 4 | import os 5 | os.environ["CUDA_VISIBLE_DEVICES"] = "0,1" 6 | import torch 7 | import torch.nn as nn 8 | import torch.optim as optim 9 | import torch.backends.cudnn as cudnn 10 | import torch.nn.init as init 11 | import argparse 12 | import torch.utils.data as data 13 | from data.voc0712 import VOCDetection, detection_collate 14 | from data.coco import COCODetection 15 | from model.yolo import Yolov3 16 | from data.config import voc_config, coco_config 17 | from layers.yolo_loss import YoloLoss 18 | from layers.multiyolo_loss import MultiYoloLoss 19 | import numpy as np 20 | import time 21 | import os 22 | import sys 23 | 24 | 25 | def arg_parse(): 26 | """ 27 | Parse arguments to the train module 28 | """ 29 | parser = argparse.ArgumentParser( 30 | description='Yolov3 pytorch Training') 31 | parser.add_argument('-v', '--version', default='yolov3', 32 | help='') 33 | parser.add_argument("--input_wh", dest = "input_wh", type=int, nargs=2, default = [416, 416]) 34 | parser.add_argument('-d', '--dataset', default='VOC', 35 | help='VOC or COCO dataset') 36 | parser.add_argument('-b', '--batch_size', default=64, 37 | type=int, help='Batch size for training') 38 | parser.add_argument('--basenet', default='./weights/convert_darknet53.pth', help='pretrained base model') 39 | parser.add_argument('--ignore_thresh', default=0.5, 40 | type=float, help='ignore_thresh') 41 | parser.add_argument('--subdivisions', default=4, 42 | type=int, help='subdivisions for large batch_size') 43 | parser.add_argument('--num_workers', default=4, 44 | type=int, help='Number of workers used in dataloading') 45 | parser.add_argument('--cuda', default=True, 46 | type=bool, help='Use cuda to train model') 47 | parser.add_argument('--merge_yolo_loss', default=True, 48 | type=bool, help='merge yolo loss') 49 | parser.add_argument('--lr', '--learning-rate', 50 | default=1e-3, type=float, help='initial learning rate') 51 | parser.add_argument('--ngpu', default=2, type=int, help='gpus') 52 | 53 | parser.add_argument('--resume_net', default=None, 54 | help='resume net for retraining') 55 | parser.add_argument('--resume_epoch', default=0, 56 | type=int, help='resume iter for retraining') 57 | parser.add_argument('-max','--max_epoch', default=200, 58 | type=int, help='max epoch for retraining') 59 | parser.add_argument('--save_folder', default='./weights/', 60 | help='Location to save checkpoint models') 61 | 62 | return parser.parse_args() 63 | 64 | def adjust_learning_rate(optimizer, gamma, epoch, step_index, iteration, epoch_size): 65 | """Sets the learning rate 66 | # Adapted from PyTorch Imagenet example: 67 | # https://github.com/pytorch/examples/blob/master/imagenet/main.py 68 | """ 69 | if iteration < 1000: 70 | # warm up training 71 | lr = 0.001 * pow((iteration)/1000, 4) 72 | else: 73 | lr = args.lr * (gamma ** (step_index)) 74 | for param_group in optimizer.param_groups: 75 | param_group['lr'] = lr 76 | return lr 77 | 78 | 79 | if __name__ == '__main__': 80 | args = arg_parse() 81 | basenet = args.basenet 82 | save_folder = args.save_folder 83 | input_wh = args.input_wh 84 | batch_size = args.batch_size 85 | weight_decay = 0.0005 86 | gamma = 0.1 87 | momentum = 0.9 88 | cuda = args.cuda 89 | dataset_name = args.dataset 90 | subdivisions = args.subdivisions 91 | ignore_thresh = args.ignore_thresh 92 | merge_yolo_loss = args.merge_yolo_loss 93 | if not os.path.exists(save_folder): 94 | os.mkdir(save_folder) 95 | if cuda and torch.cuda.is_available(): 96 | torch.set_default_tensor_type('torch.cuda.FloatTensor') 97 | else: 98 | torch.set_default_tensor_type('torch.FloatTensor') 99 | 100 | # different datasets, include coco, voc0712 trainval, coco val 101 | datasets_version = {"VOC": [('0712', '0712_trainval')], 102 | "VOC0712++": [('0712', '0712_trainval_test')], 103 | "VOC2012" : [('2012', '2012_trainval')], 104 | "COCO": [('2014', 'train'), ('2014', 'valminusminival')], 105 | "VOC2007": [('0712', "2007_test")], 106 | "COCOval": [('2014', 'minival')]} 107 | 108 | print('Loading Dataset...') 109 | if dataset_name[0] == "V": 110 | cfg = voc_config 111 | train_dataset = VOCDetection(cfg["root"], datasets_version[dataset_name], input_wh, batch_size, cfg["multiscale"], dataset_name) 112 | elif dataset_name[0] == "C": 113 | cfg = coco_config 114 | train_dataset = COCODetection(cfg["root"], datasets_version[dataset_name], input_wh, batch_size, cfg["multiscale"], dataset_name) 115 | else: 116 | print('Unkown dataset!') 117 | 118 | # load Yolov3 net 119 | net = Yolov3("train", input_wh, cfg["anchors"], cfg["anchors_mask"], cfg["num_classes"]) 120 | if args.resume_net == None: 121 | net.load_weights(basenet) 122 | else: 123 | state_dict = torch.load(args.resume_net) 124 | from collections import OrderedDict 125 | new_state_dict = OrderedDict() 126 | for k, v in state_dict.items(): 127 | head = k[:7] 128 | if head == 'module.': 129 | name = k[7:] # remove `module.` 130 | else: 131 | name = k 132 | new_state_dict[name] = v 133 | net.load_state_dict(new_state_dict) 134 | print('Loading resume network...') 135 | 136 | if args.ngpu > 1: 137 | net = torch.nn.DataParallel(net) 138 | 139 | if args.cuda: 140 | net.cuda() 141 | cudnn.benchmark = True 142 | 143 | optimizer = optim.SGD(net.parameters(), lr=args.lr, 144 | momentum=momentum, weight_decay=weight_decay) 145 | 146 | # load yolo loss 147 | if merge_yolo_loss: 148 | criterion = MultiYoloLoss(input_wh, cfg["num_classes"], ignore_thresh, cfg["anchors"], cfg["anchors_mask"]) 149 | else: 150 | criterion = YoloLoss(input_wh, cfg["num_classes"], ignore_thresh, cfg["anchors"], cfg["anchors_mask"]) 151 | net.train() 152 | ave_loss = -1 153 | epoch = 0 + args.resume_epoch 154 | mini_batch_size = int(batch_size / subdivisions) 155 | 156 | epoch_size = len(train_dataset) // (batch_size) 157 | max_iter = args.max_epoch * epoch_size 158 | 159 | stepvalues_VOC = (160 * epoch_size, 180 * epoch_size, 201 * epoch_size) 160 | stepvalues_COCO = (90 * epoch_size, 120 * epoch_size, 140 * epoch_size) 161 | stepvalues = (stepvalues_VOC, stepvalues_COCO)[args.dataset=='COCO'] 162 | 163 | print('Training', args.version, 'on', train_dataset.name) 164 | step_index = 0 165 | 166 | if args.resume_epoch > 0: 167 | start_iter = args.resume_epoch * epoch_size 168 | else: 169 | start_iter = 0 170 | 171 | lr = args.lr 172 | 173 | # begin to train 174 | for iteration in range(start_iter, max_iter): 175 | if iteration % epoch_size == 0: 176 | batch_iterator = iter(data.DataLoader(train_dataset, 177 | mini_batch_size, 178 | shuffle=False, 179 | num_workers=args.num_workers, 180 | collate_fn=detection_collate)) 181 | if (epoch % 5 == 0 and epoch > 0) or (epoch % 5 == 0 and epoch > 200): 182 | torch.save(net.state_dict(), args.save_folder+args.version+'_'+args.dataset + '_epoches_'+ 183 | repr(epoch) + '.pth') 184 | epoch += 1 185 | 186 | load_t0 = time.time() 187 | if iteration in stepvalues: 188 | step_index += 1 189 | lr = adjust_learning_rate(optimizer, gamma, epoch, step_index, iteration, epoch_size) 190 | debug = False 191 | if iteration % 10 == 0: 192 | debug = True 193 | optimizer.zero_grad() 194 | loss_sum = 0 195 | for i in range(subdivisions): 196 | images, targets = next(batch_iterator) 197 | images.requires_grad_() 198 | if args.cuda: 199 | images = images.cuda() 200 | with torch.no_grad(): 201 | targets = [anno.cuda() for anno in targets] 202 | else: 203 | images = images 204 | with torch.no_grad(): 205 | targets = targets 206 | # forward 207 | resize_wh = images.size(3), images.size(2) 208 | out = net(images) 209 | loss = criterion(out, targets, resize_wh, debug) / subdivisions 210 | loss.backward() 211 | loss_sum += loss.item() 212 | 213 | if ave_loss < 0: 214 | ave_loss = loss_sum 215 | ave_loss = 0.1 * loss_sum + 0.9 * ave_loss 216 | optimizer.step() 217 | load_t1 = time.time() 218 | if iteration % 10 == 0: 219 | print('Epoch:' + repr(epoch) + ' || epochiter: ' + repr(iteration % epoch_size) + '/' + repr(epoch_size) 220 | + '|| Totel iter ' + 221 | repr(iteration) + ' Cur : %.4f Ave : %.4f' % (loss_sum, ave_loss) + 222 | ' iteration time: %.4f sec. ||' % (load_t1 - load_t0) + 'LR: %.5f' % (lr)) 223 | 224 | torch.save(net.state_dict(), args.save_folder+args.version+'_'+args.dataset + "_final"+ '.pth') 225 | 226 | 227 | 228 | 229 | 230 | -------------------------------------------------------------------------------- /utils/box_utils.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.functional as F 6 | from torch.autograd import Variable 7 | import numpy as np 8 | import math 9 | import cv2 10 | import time 11 | from utils.nms_wrapper import nms 12 | 13 | 14 | def get_rects(detection, input_wh, ori_wh, use_pad=False): 15 | if len(detection) > 0: 16 | if use_pad: 17 | scaling_factor = min(input_wh[0] / ori_wh[0], input_wh[1] / ori_wh[1]) 18 | detection[:,[1,3]] -= (input_wh[0] - scaling_factor * ori_wh[0]) / 2 19 | detection[:,[2,4]] -= (input_wh[1] - scaling_factor * ori_wh[1]) / 2 20 | detection[:,1:5] /= scaling_factor 21 | else: 22 | detection[:,[1,3]] /= input_wh[0] 23 | detection[:,[2,4]] /= input_wh[1] 24 | detection[:, [1,3]] *= ori_wh[0] 25 | detection[:, [2,4]] *= ori_wh[1] 26 | for i in range(detection.shape[0]): 27 | detection[i, [1,3]] = torch.clamp(detection[i, [1,3]], 0.0, ori_wh[0]) 28 | detection[i, [2,4]] = torch.clamp(detection[i, [2,4]], 0.0, ori_wh[1]) 29 | return detection 30 | 31 | def draw_rects(img, rects, classes): 32 | print(rects) 33 | for rect in rects: 34 | if rect[5] > 0.1: 35 | left_top = (int(rect[0]), int(rect[1])) 36 | right_bottom = (int(rect[2]), int(rect[3])) 37 | score = round(rect[4], 3) 38 | cls_id = int(rect[-1]) 39 | label = "{0}".format(classes[cls_id]) 40 | class_len = len(classes) 41 | offset = cls_id * 123457 % class_len 42 | red = get_color(2, offset, class_len) 43 | green = get_color(1, offset, class_len) 44 | blue = get_color(0, offset, class_len) 45 | color = (blue, green, red) 46 | cv2.rectangle(img, left_top, right_bottom, color, 2) 47 | t_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_PLAIN, 1 , 1)[0] 48 | right_bottom = left_top[0] + t_size[0] + 3, left_top[1] - t_size[1] - 4 49 | cv2.rectangle(img, left_top, right_bottom, color, -1) 50 | cv2.putText(img, str(label)+str(score), (left_top[0], left_top[1] - t_size[1] - 4), cv2.FONT_HERSHEY_PLAIN, 1, [225,255,255], 1) 51 | return img 52 | 53 | def get_color(c, x, max_val): 54 | colors = torch.FloatTensor([[1,0,1],[0,0,1],[0,1,1],[0,1,0],[1,1,0],[1,0,0]]) 55 | ratio = float(x) / max_val * 5 56 | i = int(math.floor(ratio)) 57 | j = int(math.ceil(ratio)) 58 | ratio = ratio - i 59 | r = (1-ratio) * colors[i][c] + ratio * colors[j][c] 60 | return int(r*255) 61 | 62 | 63 | def unique(tensor): 64 | tensor_np = tensor.cpu().numpy() 65 | unique_np = np.unique(tensor_np) 66 | unique_tensor = torch.from_numpy(unique_np) 67 | 68 | tensor_res = tensor.new(unique_tensor.shape) 69 | tensor_res.copy_(unique_tensor) 70 | return tensor_res 71 | 72 | def point_form(boxes): 73 | """ Convert prior_boxes to (xmin, ymin, xmax, ymax) 74 | representation for comparison to point form ground truth data. 75 | Args: 76 | boxes: (tensor) center-size default boxes from priorbox layers. 77 | Return: 78 | boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes. 79 | """ 80 | return torch.cat((boxes[:, :2] - boxes[:, 2:]/2, # xmin, ymin 81 | boxes[:, :2] + boxes[:, 2:]/2), 1) # xmax, ymax 82 | 83 | def center_size(boxes): 84 | """ Convert prior_boxes to (cx, cy, w, h) 85 | representation for comparison to center-size form ground truth data. 86 | Args: 87 | boxes: (tensor) point_form boxes 88 | Return: 89 | boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes. 90 | """ 91 | return torch.cat([(boxes[:, 2:] + boxes[:, :2])/2, boxes[:, 2:] - boxes[:, :2]], 1) # w, h 92 | 93 | 94 | def intersect(box_a, box_b): 95 | """ We resize both tensors to [A,B,2] without new malloc: 96 | [A,2] -> [A,1,2] -> [A,B,2] 97 | [B,2] -> [1,B,2] -> [A,B,2] 98 | Then we compute the area of intersect between box_a and box_b. 99 | Args: 100 | box_a: (tensor) bounding boxes, Shape: [A,4]. 101 | box_b: (tensor) bounding boxes, Shape: [B,4]. 102 | Return: 103 | (tensor) intersection area, Shape: [A,B]. 104 | """ 105 | # print(box_a) 106 | A = box_a.size(0) 107 | B = box_b.size(0) 108 | max_xy = torch.min(box_a[:, 2:].unsqueeze(1).expand(A, B, 2), 109 | box_b[:, 2:].unsqueeze(0).expand(A, B, 2)) 110 | min_xy = torch.max(box_a[:, :2].unsqueeze(1).expand(A, B, 2), 111 | box_b[:, :2].unsqueeze(0).expand(A, B, 2)) 112 | inter = torch.clamp((max_xy - min_xy), min=0) 113 | return inter[:, :, 0] * inter[:, :, 1] 114 | 115 | 116 | def jaccard(box_a, box_b): 117 | """Compute the jaccard overlap of two sets of boxes. The jaccard overlap 118 | is simply the intersection over union of two boxes. Here we operate on 119 | ground truth boxes and default boxes. 120 | E.g.: 121 | A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B) 122 | Args: 123 | box_a: (tensor) Ground truth bounding boxes, Shape: [num_objects,4] 124 | box_b: (tensor) Prior boxes from priorbox layers, Shape: [num_priors,4] 125 | Return: 126 | jaccard overlap: (tensor) Shape: [box_a.size(0), box_b.size(0)] 127 | """ 128 | inter = intersect(box_a, box_b) 129 | area_a = ((box_a[:, 2]-box_a[:, 0]) * 130 | (box_a[:, 3]-box_a[:, 1])).unsqueeze(1).expand_as(inter) # [A,B] 131 | area_b = ((box_b[:, 2]-box_b[:, 0]) * 132 | (box_b[:, 3]-box_b[:, 1])).unsqueeze(0).expand_as(inter) # [A,B] 133 | union = area_a + area_b - inter 134 | return inter / union # [A,B] 135 | 136 | def trans_anchors(anchors): 137 | new_anchors = torch.zeros((anchors.size(0), 4)) 138 | new_anchors[:, :2] += 2000 139 | new_anchors[:, 2:] = anchors[:,] 140 | return point_form(new_anchors) 141 | 142 | def trans_truths(truths): 143 | new_truths = torch.zeros((truths.size(0), 4)) 144 | new_truths[:, :2] += 2000 145 | new_truths[:, 2:] = truths[:, 2:4] 146 | return point_form(new_truths) 147 | 148 | def int_index(anchors_mask, val): 149 | for i in range(len(anchors_mask)): 150 | if val == anchors_mask[i]: 151 | return i 152 | return -1 153 | 154 | def encode_targets_all(input_wh, truths, labels, best_anchor_idx, anchors, feature_dim, num_pred, back_mask): 155 | scale = torch.ones(num_pred).cuda() 156 | encode_truths = torch.zeros((num_pred, 6)).cuda() 157 | fore_mask = torch.zeros(num_pred).cuda() 158 | # l_dim, m_dim, h_dim = feature_dim 159 | l_grid_wh, m_grid_wh, h_grid_wh = feature_dim 160 | for i in range(best_anchor_idx.size(0)): 161 | index = 0 162 | grid_wh = (0, 0) 163 | # mask [0, 1, 2] 164 | if best_anchor_idx[i].item() < 2.1: 165 | grid_wh = l_grid_wh 166 | index_begin = 0 167 | # mask [3, 4, 5] 168 | elif best_anchor_idx[i].item() < 5.1: 169 | grid_wh = m_grid_wh 170 | index_begin = l_grid_wh[0] * l_grid_wh[1] * 3 171 | # mask [6, 7, 8] 172 | else: 173 | grid_wh = h_grid_wh 174 | index_begin = (l_grid_wh[0]*l_grid_wh[1] + m_grid_wh[0]*m_grid_wh[1]) * 3 175 | x = (truths[i][0] / input_wh[0]) * grid_wh[0] 176 | y = (truths[i][1] / input_wh[1]) * grid_wh[1] 177 | floor_x, floor_y = math.floor(x), math.floor(y) 178 | anchor_idx = best_anchor_idx[i].int().item() % 3 179 | index = index_begin + floor_y * grid_wh[0] * 3 + floor_x * 3 + anchor_idx 180 | 181 | scale[index] = scale[index] + 1. - (truths[i][2] / input_wh[0]) * (truths[i][3] / input_wh[1]) 182 | 183 | # encode targets x, y, w, h, objectness, class 184 | truths[i][0] = x - floor_x 185 | truths[i][1] = y - floor_y 186 | truths[i][2] = torch.log(truths[i][2] / anchors[best_anchor_idx[i]][0] + 1e-8) 187 | truths[i][3] = torch.log(truths[i][3] / anchors[best_anchor_idx[i]][1] + 1e-8) 188 | encode_truths[index, :4] = truths[i] 189 | encode_truths[index, 4] = 1. 190 | encode_truths[index, 5] = labels[i].int().item() 191 | 192 | # set foreground mask to 1 and background mask to 0, because pred should have unique target 193 | fore_mask[index] = 1. 194 | back_mask[index] = 0 195 | 196 | return encode_truths, fore_mask > 0, scale, back_mask 197 | 198 | def encode_targets_single(input_wh, truths, labels, best_anchor_idx, anchors, anchors_mask, back_mask, grid_wh): 199 | grid_w, grid_h = grid_wh[0], grid_wh[1] 200 | num_pred = grid_w * grid_h * len(anchors_mask) 201 | scale = torch.ones(num_pred).cuda() 202 | encode_truths = torch.zeros((num_pred, 6)).cuda() 203 | fore_mask = torch.zeros(num_pred).cuda() 204 | 205 | for i in range(best_anchor_idx.size(0)): 206 | mask_n = int_index(anchors_mask, best_anchor_idx[i]) 207 | if mask_n < 0: 208 | continue 209 | x = (truths[i][0] / input_wh[0]) * grid_wh[0] 210 | y = (truths[i][1] / input_wh[1]) * grid_wh[1] 211 | floor_x, floor_y = math.floor(x), math.floor(y) 212 | index = floor_y * grid_wh[0] * 3 + floor_x * 3 + mask_n 213 | scale[index] = scale[index] + 1. - (truths[i][2] / input_wh[0]) * (truths[i][3] / input_wh[1]) 214 | truths[i][0] = x - floor_x 215 | truths[i][1] = y - floor_y 216 | truths[i][2] = torch.log(truths[i][2] / anchors[best_anchor_idx[i]][0] + 1e-8) 217 | truths[i][3] = torch.log(truths[i][3] / anchors[best_anchor_idx[i]][1] + 1e-8) 218 | encode_truths[index, :4] = truths[i] 219 | encode_truths[index, 4] = 1. 220 | encode_truths[index, 5] = labels[i].int().item() 221 | fore_mask[index] = 1. 222 | back_mask[index] = 0 223 | 224 | return encode_truths, fore_mask > 0, scale, back_mask 225 | 226 | def targets_match_single(input_wh, threshold, targets, pred, anchors, anchors_mask, pred_t, scale_t, fore_mask_t, back_mask_t, grid_wh, idx, cuda=True): 227 | loc_truths = targets[:, :4].data 228 | labels = targets[:,-1].data 229 | overlaps = jaccard( 230 | loc_truths, 231 | point_form(pred)) 232 | # (Bipartite Matching) 233 | # [1,num_objects] best prior for each ground truth 234 | # best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True) 235 | # [1,num_priors] best ground truth for each prior 236 | 237 | best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True) 238 | best_truth_idx.squeeze_(0) 239 | best_truth_overlap.squeeze_(0) 240 | back_mask = (best_truth_overlap - threshold) < 0 241 | 242 | anchors = torch.FloatTensor(anchors) 243 | if cuda: 244 | anchors = anchors.cuda() 245 | 246 | center_truths = center_size(loc_truths) 247 | 248 | # convert anchor and truths to calculate iou 249 | new_anchors = trans_anchors(anchors) 250 | new_truths = trans_truths(center_truths) 251 | overlaps_ = jaccard( 252 | new_truths, 253 | new_anchors) 254 | best_anchor_overlap, best_anchor_idx = overlaps_.max(1, keepdim=True) 255 | best_anchor_idx.squeeze_(1) 256 | best_anchor_overlap.squeeze_(1) 257 | 258 | encode_truths, fore_mask, scale, back_mask = encode_targets_single(input_wh, center_truths, labels, best_anchor_idx, anchors, anchors_mask, back_mask, grid_wh) 259 | 260 | pred_t[idx] = encode_truths 261 | scale_t[idx] = scale 262 | fore_mask_t[idx] = fore_mask 263 | back_mask_t[idx] = back_mask 264 | 265 | def targets_match_all(input_wh, threshold, targets, pred, anchors, feature_dim, pred_t, scale_t, fore_mask_t, back_mask_t, num_pred, idx, cuda=True): 266 | loc_truths = targets[:, :4].data 267 | labels = targets[:,-1].data 268 | overlaps = jaccard( 269 | loc_truths, 270 | point_form(pred)) 271 | # (Bipartite Matching) 272 | # [1,num_objects] best prior for each ground truth 273 | # best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True) 274 | # [1,num_priors] best ground truth for each prior 275 | 276 | best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True) 277 | best_truth_idx.squeeze_(0) 278 | best_truth_overlap.squeeze_(0) 279 | back_mask = (best_truth_overlap - threshold) < 0 280 | 281 | anchors = torch.FloatTensor(anchors) 282 | if cuda: 283 | anchors = anchors.cuda() 284 | 285 | center_truths = center_size(loc_truths) 286 | new_anchors = trans_anchors(anchors) 287 | new_truths = trans_truths(center_truths) 288 | overlaps_ = jaccard( 289 | new_truths, 290 | new_anchors) 291 | best_anchor_overlap, best_anchor_idx = overlaps_.max(1, keepdim=True) 292 | best_anchor_idx.squeeze_(1) 293 | best_anchor_overlap.squeeze_(1) 294 | 295 | encode_truths, fore_mask, scale, back_mask = encode_targets_all(input_wh, center_truths, labels, best_anchor_idx, anchors, feature_dim, num_pred, back_mask) 296 | 297 | pred_t[idx] = encode_truths 298 | scale_t[idx] = scale 299 | fore_mask_t[idx] = fore_mask 300 | back_mask_t[idx] = back_mask 301 | 302 | def decode(prediction, input_wh, anchors, num_classes, stride_wh, cuda=True): 303 | grid_wh = (input_wh[0] // stride_wh[0], input_wh[1] // stride_wh[1]) 304 | grid_w = np.arange(grid_wh[0]) 305 | grid_h = np.arange(grid_wh[1]) 306 | a,b = np.meshgrid(grid_w, grid_h) 307 | 308 | num_anchors = len(anchors) 309 | x_offset = torch.FloatTensor(a).view(-1,1) 310 | y_offset = torch.FloatTensor(b).view(-1,1) 311 | anchors = [(a[0]/stride_wh[0], a[1]/stride_wh[1]) for a in anchors] 312 | if cuda: 313 | x_offset = x_offset.cuda() 314 | y_offset = y_offset.cuda() 315 | x_y_offset = torch.cat((x_offset, y_offset), 1).repeat(1, num_anchors).view(-1,2).unsqueeze(0) 316 | prediction[:,:,:2] += x_y_offset 317 | anchors = torch.FloatTensor(anchors) 318 | if cuda: 319 | anchors = anchors.cuda() 320 | anchors = anchors.repeat(grid_wh[0]*grid_wh[1], 1).unsqueeze(0) 321 | prediction[:,:,2:4] = torch.exp(prediction[:,:,2:4]) * anchors 322 | prediction[:,:,0] *= stride_wh[0] 323 | prediction[:,:,2] *= stride_wh[0] 324 | prediction[:,:,1] *= stride_wh[1] 325 | prediction[:,:,3] *= stride_wh[1] 326 | return prediction 327 | 328 | def permute_sigmoid(x, input_wh, num_anchors, num_classes): 329 | batch_size = x.size(0) 330 | grid_wh = (x.size(3), x.size(2)) 331 | input_w, input_h = input_wh 332 | stride_wh = (input_w // grid_wh[0], input_h // grid_wh[1]) 333 | bbox_attrs = 5 + num_classes 334 | x = x.view(batch_size, bbox_attrs*num_anchors, grid_wh[0] * grid_wh[1]) 335 | x = x.transpose(1,2).contiguous() 336 | x = x.view(batch_size, grid_wh[0]*grid_wh[1]*num_anchors, bbox_attrs) 337 | x[:,:,0] = torch.sigmoid(x[:,:,0]) 338 | x[:,:,1] = torch.sigmoid(x[:,:,1]) 339 | x[:,:, 4 : bbox_attrs] = torch.sigmoid((x[:,:, 4 : bbox_attrs])) 340 | return x, stride_wh 341 | 342 | def detection_postprecess(detection, iou_thresh, num_classes, input_wh, ori_wh, use_pad=False, nms_conf=0.4): 343 | assert detection.size(0) == 1, "only support batch_size == 1" 344 | conf_mask = (detection[:,:,4] > iou_thresh).float().unsqueeze(2) 345 | detection = detection * conf_mask 346 | try: 347 | ind_nz = torch.nonzero(detection[:,:,4]).transpose(0,1).contiguous() 348 | except: 349 | print("detect no results") 350 | return np.empty([0, 5], dtype=np.float32) 351 | bbox_pred = point_form(detection[:, :, :4].view(-1, 4)) 352 | conf_pred = detection[:, :, 4].view(-1, 1) 353 | cls_pred = detection[:, :, 5:].view(-1, num_classes) 354 | 355 | max_conf, max_conf_idx = torch.max(cls_pred, 1) 356 | 357 | max_conf = max_conf.float().unsqueeze(1) 358 | max_conf_idx = max_conf_idx.float().unsqueeze(1) 359 | 360 | # score = (conf_pred * max_conf).view(-1, 1) 361 | score = conf_pred 362 | image_pred = torch.cat((bbox_pred, score, max_conf, max_conf_idx), 1) 363 | 364 | non_zero_ind = (torch.nonzero(image_pred[:,4])) 365 | image_pred_ = image_pred[non_zero_ind.squeeze(),:].view(-1, 7) 366 | try: 367 | img_classes = unique(image_pred_[:,-1]) 368 | except: 369 | print("no class find") 370 | return np.empty([0, 7], dtype=np.float32) 371 | flag = False 372 | out_out = None 373 | for cls in img_classes: 374 | cls_mask = image_pred_*(image_pred_[:,-1] == cls).float().unsqueeze(1) 375 | class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze() 376 | 377 | image_pred_class = image_pred_[class_mask_ind].view(-1,7) 378 | keep = nms(image_pred_class.cpu().numpy(), nms_conf, force_cpu=True) 379 | image_pred_class = image_pred_class[keep] 380 | if not flag: 381 | out_put = image_pred_class 382 | flag = True 383 | else: 384 | out_put = torch.cat((out_put, image_pred_class), 0) 385 | 386 | 387 | image_pred_class = out_put 388 | if use_pad: 389 | scaling_factor = min(input_wh[0] / ori_wh[0], input_wh[1] / ori_wh[1]) 390 | image_pred_class[:,[0,2]] -= (input_wh[0] - scaling_factor * ori_wh[0]) / 2 391 | image_pred_class[:,[1,3]] -= (input_wh[1] - scaling_factor * ori_wh[1]) / 2 392 | image_pred_class[:,:4] /= scaling_factor 393 | else: 394 | image_pred_class[:,[0,2]] /= input_wh[0] 395 | image_pred_class[:,[1,3]] /= input_wh[1] 396 | image_pred_class[:, [0,2]] *= ori_wh[0] 397 | image_pred_class[:, [1,3]] *= ori_wh[1] 398 | 399 | for i in range(image_pred_class.shape[0]): 400 | image_pred_class[i, [0,2]] = torch.clamp(image_pred_class[i, [0,2]], 0.0, ori_wh[0]) 401 | image_pred_class[i, [1,3]] = torch.clamp(image_pred_class[i, [1,3]], 0.0, ori_wh[1]) 402 | return image_pred_class.cpu().numpy() 403 | 404 | -------------------------------------------------------------------------------- /utils/build.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | import os 9 | from os.path import join as pjoin 10 | import numpy as np 11 | from distutils.core import setup 12 | from distutils.extension import Extension 13 | from Cython.Distutils import build_ext 14 | 15 | 16 | def find_in_path(name, path): 17 | "Find a file in a search path" 18 | # adapted fom http://code.activestate.com/recipes/52224-find-a-file-given-a-search-path/ 19 | for dir in path.split(os.pathsep): 20 | binpath = pjoin(dir, name) 21 | if os.path.exists(binpath): 22 | return os.path.abspath(binpath) 23 | return None 24 | 25 | 26 | def locate_cuda(): 27 | """Locate the CUDA environment on the system 28 | 29 | Returns a dict with keys 'home', 'nvcc', 'include', and 'lib64' 30 | and values giving the absolute path to each directory. 31 | 32 | Starts by looking for the CUDAHOME env variable. If not found, everything 33 | is based on finding 'nvcc' in the PATH. 34 | """ 35 | 36 | # first check if the CUDAHOME env variable is in use 37 | if 'CUDAHOME' in os.environ: 38 | home = os.environ['CUDAHOME'] 39 | nvcc = pjoin(home, 'bin', 'nvcc') 40 | else: 41 | # otherwise, search the PATH for NVCC 42 | default_path = pjoin(os.sep, 'usr', 'local', 'cuda', 'bin') 43 | nvcc = find_in_path('nvcc', os.environ['PATH'] + os.pathsep + default_path) 44 | if nvcc is None: 45 | raise EnvironmentError('The nvcc binary could not be ' 46 | 'located in your $PATH. Either add it to your path, or set $CUDAHOME') 47 | home = os.path.dirname(os.path.dirname(nvcc)) 48 | 49 | cudaconfig = {'home': home, 'nvcc': nvcc, 50 | 'include': pjoin(home, 'include'), 51 | 'lib64': pjoin(home, 'lib64')} 52 | for k, v in cudaconfig.items(): 53 | if not os.path.exists(v): 54 | raise EnvironmentError('The CUDA %s path could not be located in %s' % (k, v)) 55 | 56 | return cudaconfig 57 | 58 | 59 | CUDA = locate_cuda() 60 | 61 | # Obtain the numpy include directory. This logic works across numpy versions. 62 | try: 63 | numpy_include = np.get_include() 64 | except AttributeError: 65 | numpy_include = np.get_numpy_include() 66 | 67 | 68 | def customize_compiler_for_nvcc(self): 69 | """inject deep into distutils to customize how the dispatch 70 | to gcc/nvcc works. 71 | 72 | If you subclass UnixCCompiler, it's not trivial to get your subclass 73 | injected in, and still have the right customizations (i.e. 74 | distutils.sysconfig.customize_compiler) run on it. So instead of going 75 | the OO route, I have this. Note, it's kindof like a wierd functional 76 | subclassing going on.""" 77 | 78 | # tell the compiler it can processes .cu 79 | self.src_extensions.append('.cu') 80 | 81 | # save references to the default compiler_so and _comple methods 82 | default_compiler_so = self.compiler_so 83 | super = self._compile 84 | 85 | # now redefine the _compile method. This gets executed for each 86 | # object but distutils doesn't have the ability to change compilers 87 | # based on source extension: we add it. 88 | def _compile(obj, src, ext, cc_args, extra_postargs, pp_opts): 89 | print(extra_postargs) 90 | if os.path.splitext(src)[1] == '.cu': 91 | # use the cuda for .cu files 92 | self.set_executable('compiler_so', CUDA['nvcc']) 93 | # use only a subset of the extra_postargs, which are 1-1 translated 94 | # from the extra_compile_args in the Extension class 95 | postargs = extra_postargs['nvcc'] 96 | else: 97 | postargs = extra_postargs['gcc'] 98 | 99 | super(obj, src, ext, cc_args, postargs, pp_opts) 100 | # reset the default compiler_so, which we might have changed for cuda 101 | self.compiler_so = default_compiler_so 102 | 103 | # inject our redefined _compile method into the class 104 | self._compile = _compile 105 | 106 | 107 | # run the customize_compiler 108 | class custom_build_ext(build_ext): 109 | def build_extensions(self): 110 | customize_compiler_for_nvcc(self.compiler) 111 | build_ext.build_extensions(self) 112 | 113 | 114 | ext_modules = [ 115 | Extension( 116 | "nms.cpu_nms", 117 | ["nms/cpu_nms.pyx"], 118 | extra_compile_args={'gcc': ["-Wno-cpp", "-Wno-unused-function"]}, 119 | include_dirs=[numpy_include] 120 | ), 121 | Extension('nms.gpu_nms', 122 | ['nms/nms_kernel.cu', 'nms/gpu_nms.pyx'], 123 | library_dirs=[CUDA['lib64']], 124 | libraries=['cudart'], 125 | language='c++', 126 | runtime_library_dirs=[CUDA['lib64']], 127 | # this syntax is specific to this build system 128 | # we're only going to use certain compiler args with nvcc and not with gcc 129 | # the implementation of this trick is in customize_compiler() below 130 | extra_compile_args={'gcc': ["-Wno-unused-function"], 131 | 'nvcc': ['-arch=sm_61', 132 | '--ptxas-options=-v', 133 | '-c', 134 | '--compiler-options', 135 | "'-fPIC'"]}, 136 | include_dirs=[numpy_include, CUDA['include']] 137 | ), 138 | # Extension( 139 | # 'pycocotools._mask', 140 | # sources=['pycocotools/maskApi.c', 'pycocotools/_mask.pyx'], 141 | # include_dirs=[numpy_include, 'pycocotools'], 142 | # extra_compile_args={ 143 | # 'gcc': ['-Wno-cpp', '-Wno-unused-function', '-std=c99']}, 144 | # ), 145 | ] 146 | 147 | setup( 148 | name='mot_utils', 149 | ext_modules=ext_modules, 150 | # inject our custom trigger 151 | cmdclass={'build_ext': custom_build_ext}, 152 | ) 153 | -------------------------------------------------------------------------------- /utils/gen_anchors.py: -------------------------------------------------------------------------------- 1 | import random 2 | import argparse 3 | import numpy as np 4 | import os 5 | import sys 6 | if sys.version_info[0] == 2: 7 | import xml.etree.cElementTree as ET 8 | else: 9 | import xml.etree.ElementTree as ET 10 | import pickle 11 | 12 | import json 13 | 14 | def parse_voc_annotation(ann_dir, img_dir, train_val_list, cache_name, labels=[]): 15 | if os.path.exists(cache_name): 16 | with open(cache_name, 'rb') as handle: 17 | cache = pickle.load(handle) 18 | all_insts, seen_labels = cache['all_insts'], cache['seen_labels'] 19 | else: 20 | all_insts = [] 21 | seen_labels = {} 22 | 23 | for ann in sorted(train_val_list): 24 | img = {'object':[]} 25 | 26 | try: 27 | tree = ET.parse(os.path.join(ann_dir, ann + ".xml")) 28 | except Exception as e: 29 | print(e) 30 | print('Ignore this bad annotation: ' + ann_dir + ann) 31 | continue 32 | 33 | for elem in tree.iter(): 34 | if 'filename' in elem.tag: 35 | img['filename'] = os.path.join(img_dir, elem.text + ".jpg") 36 | if 'width' in elem.tag: 37 | img['width'] = int(elem.text) 38 | if 'height' in elem.tag: 39 | img['height'] = int(elem.text) 40 | if 'object' in elem.tag or 'part' in elem.tag: 41 | obj = {} 42 | 43 | for attr in list(elem): 44 | if 'name' in attr.tag: 45 | obj['name'] = attr.text 46 | 47 | if obj['name'] in seen_labels: 48 | seen_labels[obj['name']] += 1 49 | else: 50 | seen_labels[obj['name']] = 1 51 | 52 | if len(labels) > 0 and obj['name'] not in labels: 53 | break 54 | else: 55 | img['object'] += [obj] 56 | 57 | if 'bndbox' in attr.tag: 58 | for dim in list(attr): 59 | if 'xmin' in dim.tag: 60 | obj['xmin'] = int(round(float(dim.text))) 61 | if 'ymin' in dim.tag: 62 | obj['ymin'] = int(round(float(dim.text))) 63 | if 'xmax' in dim.tag: 64 | obj['xmax'] = int(round(float(dim.text))) 65 | if 'ymax' in dim.tag: 66 | obj['ymax'] = int(round(float(dim.text))) 67 | 68 | if len(img['object']) > 0: 69 | all_insts += [img] 70 | 71 | cache = {'all_insts': all_insts, 'seen_labels': seen_labels} 72 | with open(cache_name, 'wb') as handle: 73 | pickle.dump(cache, handle, protocol=pickle.HIGHEST_PROTOCOL) 74 | 75 | return all_insts, seen_labels 76 | 77 | def IOU(ann, centroids): 78 | w, h = ann 79 | similarities = [] 80 | 81 | for centroid in centroids: 82 | c_w, c_h = centroid 83 | 84 | if c_w >= w and c_h >= h: 85 | similarity = w*h/(c_w*c_h) 86 | elif c_w >= w and c_h <= h: 87 | similarity = w*c_h/(w*h + (c_w-w)*c_h) 88 | elif c_w <= w and c_h >= h: 89 | similarity = c_w*h/(w*h + c_w*(c_h-h)) 90 | else: #means both w,h are bigger than c_w and c_h respectively 91 | similarity = (c_w*c_h)/(w*h) 92 | similarities.append(similarity) # will become (k,) shape 93 | 94 | return np.array(similarities) 95 | 96 | def avg_IOU(anns, centroids): 97 | n,d = anns.shape 98 | sum = 0. 99 | 100 | for i in range(anns.shape[0]): 101 | sum+= max(IOU(anns[i], centroids)) 102 | 103 | return sum/n 104 | 105 | def print_anchors(centroids): 106 | out_string = '' 107 | 108 | anchors = centroids.copy() 109 | 110 | widths = anchors[:, 0] 111 | sorted_indices = np.argsort(widths) 112 | 113 | r = "anchors: [" 114 | for i in sorted_indices: 115 | out_string += str(int(anchors[i,0]*416)) + ',' + str(int(anchors[i,1]*416)) + ', ' 116 | 117 | print(out_string[:-2]) 118 | 119 | def run_kmeans(ann_dims, anchor_num): 120 | ann_num = ann_dims.shape[0] 121 | iterations = 0 122 | prev_assignments = np.ones(ann_num)*(-1) 123 | iteration = 0 124 | old_distances = np.zeros((ann_num, anchor_num)) 125 | 126 | indices = [random.randrange(ann_dims.shape[0]) for i in range(anchor_num)] 127 | centroids = ann_dims[indices] 128 | anchor_dim = ann_dims.shape[1] 129 | 130 | while True: 131 | distances = [] 132 | iteration += 1 133 | for i in range(ann_num): 134 | d = 1 - IOU(ann_dims[i], centroids) 135 | distances.append(d) 136 | distances = np.array(distances) # distances.shape = (ann_num, anchor_num) 137 | 138 | print("iteration {}: dists = {}".format(iteration, np.sum(np.abs(old_distances-distances)))) 139 | 140 | #assign samples to centroids 141 | assignments = np.argmin(distances,axis=1) 142 | 143 | if (assignments == prev_assignments).all() : 144 | return centroids 145 | 146 | #calculate new centroids 147 | centroid_sums=np.zeros((anchor_num, anchor_dim), np.float) 148 | for i in range(ann_num): 149 | centroid_sums[assignments[i]]+=ann_dims[i] 150 | for j in range(anchor_num): 151 | centroids[j] = centroid_sums[j]/(np.sum(assignments==j) + 1e-6) 152 | 153 | prev_assignments = assignments.copy() 154 | old_distances = distances.copy() 155 | 156 | def _main_(argv): 157 | num_anchors = args.anchors 158 | train_annot_folder = "/localSSD/yyq/VOCdevkit0712/VOC0712/Annotations/" 159 | train_image_folder = "/localSSD/yyq/VOCdevkit0712/VOC0712/JPEGImages/" 160 | train_val_txt = "/localSSD/yyq/VOCdevkit0712/VOC0712/ImageSets/Main/0712_trainval_test.txt" 161 | with open(train_val_txt, "r") as f: 162 | train_val_list = [i.strip() for i in f.readlines()] 163 | cache_name = "voc_train.pkl" 164 | labels = ["aeroplane", "bicycle", "bird", "boat", 165 | "bottle", "bus", "car", "cat", "chair", 166 | "cow", "diningtable", "dog", "horse", 167 | "motorbike", "person", "pottedplant", 168 | "sheep", "sofa", "train", "tvmonitor"] 169 | 170 | train_imgs, train_labels = parse_voc_annotation( 171 | train_annot_folder, 172 | train_image_folder, 173 | train_val_list, 174 | cache_name, 175 | labels 176 | ) 177 | 178 | # run k_mean to find the anchors 179 | annotation_dims = [] 180 | for image in train_imgs: 181 | # print(image['filename']) 182 | for obj in image['object']: 183 | relative_w = (float(obj['xmax']) - float(obj['xmin']))/image['width'] 184 | relatice_h = (float(obj["ymax"]) - float(obj['ymin']))/image['height'] 185 | annotation_dims.append(tuple(map(float, (relative_w,relatice_h)))) 186 | 187 | annotation_dims = np.array(annotation_dims) 188 | centroids = run_kmeans(annotation_dims, num_anchors) 189 | 190 | # write anchors to file 191 | print('\naverage IOU for', num_anchors, 'anchors:', '%0.2f' % avg_IOU(annotation_dims, centroids)) 192 | print_anchors(centroids) 193 | 194 | if __name__ == '__main__': 195 | argparser = argparse.ArgumentParser() 196 | 197 | argparser.add_argument( 198 | '-c', 199 | '--conf', 200 | default='config.json', 201 | help='path to configuration file') 202 | argparser.add_argument( 203 | '-a', 204 | '--anchors', 205 | default=9, 206 | help='number of anchors to use') 207 | 208 | args = argparser.parse_args() 209 | _main_(args) -------------------------------------------------------------------------------- /utils/nms/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/utils/nms/__init__.py -------------------------------------------------------------------------------- /utils/nms/cpu_nms.pyx: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | import numpy as np 9 | cimport numpy as np 10 | 11 | cdef inline np.float32_t max(np.float32_t a, np.float32_t b): 12 | return a if a >= b else b 13 | 14 | cdef inline np.float32_t min(np.float32_t a, np.float32_t b): 15 | return a if a <= b else b 16 | 17 | cdef inline np.float32_t abs(np.float32_t a, np.float32_t b): 18 | return a - b if a >= b else b - a 19 | 20 | def cpu_nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh): 21 | cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:, 0] 22 | cdef np.ndarray[np.float32_t, ndim=1] y1 = dets[:, 1] 23 | cdef np.ndarray[np.float32_t, ndim=1] x2 = dets[:, 2] 24 | cdef np.ndarray[np.float32_t, ndim=1] y2 = dets[:, 3] 25 | cdef np.ndarray[np.float32_t, ndim=1] scores = dets[:, 4] 26 | 27 | cdef np.ndarray[np.float32_t, ndim=1] areas = (x2 - x1 + 1) * (y2 - y1 + 1) 28 | cdef np.ndarray[np.int_t, ndim=1] order = scores.argsort()[::-1] 29 | 30 | cdef int ndets = dets.shape[0] 31 | cdef np.ndarray[np.int_t, ndim=1] suppressed = \ 32 | np.zeros((ndets), dtype=np.int) 33 | 34 | # nominal indices 35 | cdef int _i, _j 36 | # sorted indices 37 | cdef int i, j 38 | # temp variables for box i's (the box currently under consideration) 39 | cdef np.float32_t ix1, iy1, ix2, iy2, iarea 40 | # variables for computing overlap with box j (lower scoring box) 41 | cdef np.float32_t xx1, yy1, xx2, yy2 42 | cdef np.float32_t w, h 43 | cdef np.float32_t inter, ovr 44 | 45 | keep = [] 46 | for _i in range(ndets): 47 | i = order[_i] 48 | if suppressed[i] == 1: 49 | continue 50 | keep.append(i) 51 | ix1 = x1[i] 52 | iy1 = y1[i] 53 | ix2 = x2[i] 54 | iy2 = y2[i] 55 | iarea = areas[i] 56 | for _j in range(_i + 1, ndets): 57 | j = order[_j] 58 | if suppressed[j] == 1: 59 | continue 60 | xx1 = max(ix1, x1[j]) 61 | yy1 = max(iy1, y1[j]) 62 | xx2 = min(ix2, x2[j]) 63 | yy2 = min(iy2, y2[j]) 64 | w = max(0.0, xx2 - xx1 + 1) 65 | h = max(0.0, yy2 - yy1 + 1) 66 | inter = w * h 67 | ovr = inter / (iarea + areas[j] - inter) 68 | if ovr >= thresh: 69 | suppressed[j] = 1 70 | 71 | return keep 72 | 73 | def cpu_soft_nms(np.ndarray[float, ndim=2] boxes, float sigma=0.5, float Nt=0.3, float threshold=0.001, unsigned int method=0): 74 | cdef unsigned int N = boxes.shape[0] 75 | cdef float iw, ih, box_area 76 | cdef float ua 77 | cdef int pos = 0 78 | cdef float maxscore = 0 79 | cdef int maxpos = 0 80 | cdef float x1,x2,y1,y2,tx1,tx2,ty1,ty2,ts,area,weight,ov 81 | 82 | for i in range(N): 83 | maxscore = boxes[i, 4] 84 | maxpos = i 85 | 86 | tx1 = boxes[i,0] 87 | ty1 = boxes[i,1] 88 | tx2 = boxes[i,2] 89 | ty2 = boxes[i,3] 90 | ts = boxes[i,4] 91 | 92 | pos = i + 1 93 | # get max box 94 | while pos < N: 95 | if maxscore < boxes[pos, 4]: 96 | maxscore = boxes[pos, 4] 97 | maxpos = pos 98 | pos = pos + 1 99 | 100 | # add max box as a detection 101 | boxes[i,0] = boxes[maxpos,0] 102 | boxes[i,1] = boxes[maxpos,1] 103 | boxes[i,2] = boxes[maxpos,2] 104 | boxes[i,3] = boxes[maxpos,3] 105 | boxes[i,4] = boxes[maxpos,4] 106 | 107 | # swap ith box with position of max box 108 | boxes[maxpos,0] = tx1 109 | boxes[maxpos,1] = ty1 110 | boxes[maxpos,2] = tx2 111 | boxes[maxpos,3] = ty2 112 | boxes[maxpos,4] = ts 113 | 114 | tx1 = boxes[i,0] 115 | ty1 = boxes[i,1] 116 | tx2 = boxes[i,2] 117 | ty2 = boxes[i,3] 118 | ts = boxes[i,4] 119 | 120 | pos = i + 1 121 | # NMS iterations, note that N changes if detection boxes fall below threshold 122 | while pos < N: 123 | x1 = boxes[pos, 0] 124 | y1 = boxes[pos, 1] 125 | x2 = boxes[pos, 2] 126 | y2 = boxes[pos, 3] 127 | s = boxes[pos, 4] 128 | 129 | area = (x2 - x1 + 1) * (y2 - y1 + 1) 130 | iw = (min(tx2, x2) - max(tx1, x1) + 1) 131 | if iw > 0: 132 | ih = (min(ty2, y2) - max(ty1, y1) + 1) 133 | if ih > 0: 134 | ua = float((tx2 - tx1 + 1) * (ty2 - ty1 + 1) + area - iw * ih) 135 | ov = iw * ih / ua #iou between max box and detection box 136 | 137 | if method == 1: # linear 138 | if ov > Nt: 139 | weight = 1 - ov 140 | else: 141 | weight = 1 142 | elif method == 2: # gaussian 143 | weight = np.exp(-(ov * ov)/sigma) 144 | else: # original NMS 145 | if ov > Nt: 146 | weight = 0 147 | else: 148 | weight = 1 149 | 150 | boxes[pos, 4] = weight*boxes[pos, 4] 151 | 152 | # if box score falls below threshold, discard the box by swapping with last box 153 | # update N 154 | if boxes[pos, 4] < threshold: 155 | boxes[pos,0] = boxes[N-1, 0] 156 | boxes[pos,1] = boxes[N-1, 1] 157 | boxes[pos,2] = boxes[N-1, 2] 158 | boxes[pos,3] = boxes[N-1, 3] 159 | boxes[pos,4] = boxes[N-1, 4] 160 | N = N - 1 161 | pos = pos - 1 162 | 163 | pos = pos + 1 164 | 165 | keep = [i for i in range(N)] 166 | return keep 167 | -------------------------------------------------------------------------------- /utils/nms/gpu_nms.hpp: -------------------------------------------------------------------------------- 1 | void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num, 2 | int boxes_dim, float nms_overlap_thresh, int device_id); 3 | -------------------------------------------------------------------------------- /utils/nms/gpu_nms.pyx: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Faster R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | import numpy as np 9 | cimport numpy as np 10 | 11 | assert sizeof(int) == sizeof(np.int32_t) 12 | 13 | cdef extern from "gpu_nms.hpp": 14 | void _nms(np.int32_t*, int*, np.float32_t*, int, int, float, int) 15 | 16 | def gpu_nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh, 17 | np.int32_t device_id=0): 18 | cdef int boxes_num = dets.shape[0] 19 | cdef int boxes_dim = dets.shape[1] 20 | cdef int num_out 21 | cdef np.ndarray[np.int32_t, ndim=1] \ 22 | keep = np.zeros(boxes_num, dtype=np.int32) 23 | cdef np.ndarray[np.float32_t, ndim=1] \ 24 | scores = dets[:, 4] 25 | cdef np.ndarray[np.int_t, ndim=1] \ 26 | order = scores.argsort()[::-1] 27 | cdef np.ndarray[np.float32_t, ndim=2] \ 28 | sorted_dets = dets[order, :] 29 | _nms(&keep[0], &num_out, &sorted_dets[0, 0], boxes_num, boxes_dim, thresh, device_id) 30 | keep = keep[:num_out] 31 | return list(order[keep]) 32 | -------------------------------------------------------------------------------- /utils/nms/nms_kernel.cu: -------------------------------------------------------------------------------- 1 | // ------------------------------------------------------------------ 2 | // Faster R-CNN 3 | // Copyright (c) 2015 Microsoft 4 | // Licensed under The MIT License [see fast-rcnn/LICENSE for details] 5 | // Written by Shaoqing Ren 6 | // ------------------------------------------------------------------ 7 | 8 | #include "gpu_nms.hpp" 9 | #include 10 | #include 11 | 12 | #define CUDA_CHECK(condition) \ 13 | /* Code block avoids redefinition of cudaError_t error */ \ 14 | do { \ 15 | cudaError_t error = condition; \ 16 | if (error != cudaSuccess) { \ 17 | std::cout << cudaGetErrorString(error) << std::endl; \ 18 | } \ 19 | } while (0) 20 | 21 | #define DIVUP(m,n) ((m) / (n) + ((m) % (n) > 0)) 22 | int const threadsPerBlock = sizeof(unsigned long long) * 8; 23 | 24 | __device__ inline float devIoU(float const * const a, float const * const b) { 25 | float left = max(a[0], b[0]), right = min(a[2], b[2]); 26 | float top = max(a[1], b[1]), bottom = min(a[3], b[3]); 27 | float width = max(right - left + 1, 0.f), height = max(bottom - top + 1, 0.f); 28 | float interS = width * height; 29 | float Sa = (a[2] - a[0] + 1) * (a[3] - a[1] + 1); 30 | float Sb = (b[2] - b[0] + 1) * (b[3] - b[1] + 1); 31 | return interS / (Sa + Sb - interS); 32 | } 33 | 34 | __global__ void nms_kernel(const int n_boxes, const float nms_overlap_thresh, 35 | const float *dev_boxes, unsigned long long *dev_mask) { 36 | const int row_start = blockIdx.y; 37 | const int col_start = blockIdx.x; 38 | 39 | // if (row_start > col_start) return; 40 | 41 | const int row_size = 42 | min(n_boxes - row_start * threadsPerBlock, threadsPerBlock); 43 | const int col_size = 44 | min(n_boxes - col_start * threadsPerBlock, threadsPerBlock); 45 | 46 | __shared__ float block_boxes[threadsPerBlock * 5]; 47 | if (threadIdx.x < col_size) { 48 | block_boxes[threadIdx.x * 5 + 0] = 49 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 0]; 50 | block_boxes[threadIdx.x * 5 + 1] = 51 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 1]; 52 | block_boxes[threadIdx.x * 5 + 2] = 53 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 2]; 54 | block_boxes[threadIdx.x * 5 + 3] = 55 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 3]; 56 | block_boxes[threadIdx.x * 5 + 4] = 57 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 4]; 58 | } 59 | __syncthreads(); 60 | 61 | if (threadIdx.x < row_size) { 62 | const int cur_box_idx = threadsPerBlock * row_start + threadIdx.x; 63 | const float *cur_box = dev_boxes + cur_box_idx * 5; 64 | int i = 0; 65 | unsigned long long t = 0; 66 | int start = 0; 67 | if (row_start == col_start) { 68 | start = threadIdx.x + 1; 69 | } 70 | for (i = start; i < col_size; i++) { 71 | if (devIoU(cur_box, block_boxes + i * 5) > nms_overlap_thresh) { 72 | t |= 1ULL << i; 73 | } 74 | } 75 | const int col_blocks = DIVUP(n_boxes, threadsPerBlock); 76 | dev_mask[cur_box_idx * col_blocks + col_start] = t; 77 | } 78 | } 79 | 80 | void _set_device(int device_id) { 81 | int current_device; 82 | CUDA_CHECK(cudaGetDevice(¤t_device)); 83 | if (current_device == device_id) { 84 | return; 85 | } 86 | // The call to cudaSetDevice must come before any calls to Get, which 87 | // may perform initialization using the GPU. 88 | CUDA_CHECK(cudaSetDevice(device_id)); 89 | } 90 | 91 | void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num, 92 | int boxes_dim, float nms_overlap_thresh, int device_id) { 93 | _set_device(device_id); 94 | 95 | float* boxes_dev = NULL; 96 | unsigned long long* mask_dev = NULL; 97 | 98 | const int col_blocks = DIVUP(boxes_num, threadsPerBlock); 99 | 100 | CUDA_CHECK(cudaMalloc(&boxes_dev, 101 | boxes_num * boxes_dim * sizeof(float))); 102 | CUDA_CHECK(cudaMemcpy(boxes_dev, 103 | boxes_host, 104 | boxes_num * boxes_dim * sizeof(float), 105 | cudaMemcpyHostToDevice)); 106 | 107 | CUDA_CHECK(cudaMalloc(&mask_dev, 108 | boxes_num * col_blocks * sizeof(unsigned long long))); 109 | 110 | dim3 blocks(DIVUP(boxes_num, threadsPerBlock), 111 | DIVUP(boxes_num, threadsPerBlock)); 112 | dim3 threads(threadsPerBlock); 113 | nms_kernel<<>>(boxes_num, 114 | nms_overlap_thresh, 115 | boxes_dev, 116 | mask_dev); 117 | 118 | std::vector mask_host(boxes_num * col_blocks); 119 | CUDA_CHECK(cudaMemcpy(&mask_host[0], 120 | mask_dev, 121 | sizeof(unsigned long long) * boxes_num * col_blocks, 122 | cudaMemcpyDeviceToHost)); 123 | 124 | std::vector remv(col_blocks); 125 | memset(&remv[0], 0, sizeof(unsigned long long) * col_blocks); 126 | 127 | int num_to_keep = 0; 128 | for (int i = 0; i < boxes_num; i++) { 129 | int nblock = i / threadsPerBlock; 130 | int inblock = i % threadsPerBlock; 131 | 132 | if (!(remv[nblock] & (1ULL << inblock))) { 133 | keep_out[num_to_keep++] = i; 134 | unsigned long long *p = &mask_host[0] + i * col_blocks; 135 | for (int j = nblock; j < col_blocks; j++) { 136 | remv[j] |= p[j]; 137 | } 138 | } 139 | } 140 | *num_out = num_to_keep; 141 | 142 | CUDA_CHECK(cudaFree(boxes_dev)); 143 | CUDA_CHECK(cudaFree(mask_dev)); 144 | } 145 | -------------------------------------------------------------------------------- /utils/nms/py_cpu_nms.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | import numpy as np 9 | 10 | def py_cpu_nms(dets, thresh): 11 | """Pure Python NMS baseline.""" 12 | x1 = dets[:, 0] 13 | y1 = dets[:, 1] 14 | x2 = dets[:, 2] 15 | y2 = dets[:, 3] 16 | scores = dets[:, 4] 17 | 18 | areas = (x2 - x1 + 1) * (y2 - y1 + 1) 19 | order = scores.argsort()[::-1] 20 | 21 | keep = [] 22 | while order.size > 0: 23 | i = order[0] 24 | keep.append(i) 25 | xx1 = np.maximum(x1[i], x1[order[1:]]) 26 | yy1 = np.maximum(y1[i], y1[order[1:]]) 27 | xx2 = np.minimum(x2[i], x2[order[1:]]) 28 | yy2 = np.minimum(y2[i], y2[order[1:]]) 29 | 30 | w = np.maximum(0.0, xx2 - xx1 + 1) 31 | h = np.maximum(0.0, yy2 - yy1 + 1) 32 | inter = w * h 33 | ovr = inter / (areas[i] + areas[order[1:]] - inter) 34 | 35 | inds = np.where(ovr <= thresh)[0] 36 | order = order[inds + 1] 37 | 38 | return keep 39 | -------------------------------------------------------------------------------- /utils/nms_wrapper.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | from .nms.cpu_nms import cpu_nms, cpu_soft_nms 9 | from .nms.gpu_nms import gpu_nms 10 | 11 | def nms(dets, thresh, force_cpu=False): 12 | """Dispatch to either CPU or GPU NMS implementations.""" 13 | 14 | if dets.shape[0] == 0: 15 | return [] 16 | if force_cpu: 17 | #return cpu_soft_nms(dets, thresh, method = 0) 18 | return cpu_nms(dets, thresh) 19 | return gpu_nms(dets, thresh) 20 | 21 | def soft_nms(dets, Nt=0.3, sigma=0.5, thresh=0.001, method=1): 22 | """Dispatch to either CPU or GPU NMS implementations.""" 23 | 24 | if dets.shape[0] == 0: 25 | return [] 26 | return cpu_soft_nms(dets, sigma, Nt, thresh, method) -------------------------------------------------------------------------------- /utils/preprocess.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.functional as F 6 | from torch.autograd import Variable 7 | import numpy as np 8 | import cv2 9 | 10 | 11 | def letterbox_image(img, resize_wh): 12 | '''resize image with unchanged aspect ratio using padding''' 13 | img_w, img_h = img.shape[1], img.shape[0] 14 | w, h = resize_wh 15 | new_w = int(img_w * min(w/img_w, h/img_h)) 16 | new_h = int(img_h * min(w/img_w, h/img_h)) 17 | resized_image = cv2.resize(img, (new_w, new_h), interpolation = cv2.INTER_CUBIC) 18 | 19 | canvas = np.full((resize_wh[1], resize_wh[0], 3), 128) 20 | 21 | canvas[(h-new_h)//2:(h-new_h)//2 + new_h,(w-new_w)//2:(w-new_w)//2 + new_w, :] = resized_image 22 | 23 | return canvas 24 | 25 | def preproc_for_test(img, resize_wh, use_pad=False): 26 | if not use_pad: 27 | img = cv2.resize(img, resize_wh) 28 | else: 29 | img = letterbox_image(img, resize_wh) 30 | img_ = img[:,:,::-1].transpose((2,0,1)).copy() 31 | img_ = torch.from_numpy(img_).float().div(255.0).unsqueeze(0) 32 | return img_ 33 | 34 | -------------------------------------------------------------------------------- /utils/pycocotools/__init__.py: -------------------------------------------------------------------------------- 1 | __author__ = 'tylin' 2 | -------------------------------------------------------------------------------- /utils/pycocotools/_mask.pyx: -------------------------------------------------------------------------------- 1 | # distutils: language = c 2 | # distutils: sources = ../common/maskApi.c 3 | 4 | #************************************************************************** 5 | # Microsoft COCO Toolbox. version 2.0 6 | # Data, paper, and tutorials available at: http://mscoco.org/ 7 | # Code written by Piotr Dollar and Tsung-Yi Lin, 2015. 8 | # Licensed under the Simplified BSD License [see coco/license.txt] 9 | #************************************************************************** 10 | 11 | __author__ = 'tsungyi' 12 | 13 | import sys 14 | PYTHON_VERSION = sys.version_info[0] 15 | 16 | # import both Python-level and C-level symbols of Numpy 17 | # the API uses Numpy to interface C and Python 18 | import numpy as np 19 | cimport numpy as np 20 | from libc.stdlib cimport malloc, free 21 | 22 | # intialized Numpy. must do. 23 | np.import_array() 24 | 25 | # import numpy C function 26 | # we use PyArray_ENABLEFLAGS to make Numpy ndarray responsible to memoery management 27 | cdef extern from "numpy/arrayobject.h": 28 | void PyArray_ENABLEFLAGS(np.ndarray arr, int flags) 29 | 30 | # Declare the prototype of the C functions in MaskApi.h 31 | cdef extern from "maskApi.h": 32 | ctypedef unsigned int uint 33 | ctypedef unsigned long siz 34 | ctypedef unsigned char byte 35 | ctypedef double* BB 36 | ctypedef struct RLE: 37 | siz h, 38 | siz w, 39 | siz m, 40 | uint* cnts, 41 | void rlesInit( RLE **R, siz n ) 42 | void rleEncode( RLE *R, const byte *M, siz h, siz w, siz n ) 43 | void rleDecode( const RLE *R, byte *mask, siz n ) 44 | void rleMerge( const RLE *R, RLE *M, siz n, int intersect ) 45 | void rleArea( const RLE *R, siz n, uint *a ) 46 | void rleIou( RLE *dt, RLE *gt, siz m, siz n, byte *iscrowd, double *o ) 47 | void bbIou( BB dt, BB gt, siz m, siz n, byte *iscrowd, double *o ) 48 | void rleToBbox( const RLE *R, BB bb, siz n ) 49 | void rleFrBbox( RLE *R, const BB bb, siz h, siz w, siz n ) 50 | void rleFrPoly( RLE *R, const double *xy, siz k, siz h, siz w ) 51 | char* rleToString( const RLE *R ) 52 | void rleFrString( RLE *R, char *s, siz h, siz w ) 53 | 54 | # python class to wrap RLE array in C 55 | # the class handles the memory allocation and deallocation 56 | cdef class RLEs: 57 | cdef RLE *_R 58 | cdef siz _n 59 | 60 | def __cinit__(self, siz n =0): 61 | rlesInit(&self._R, n) 62 | self._n = n 63 | 64 | # free the RLE array here 65 | def __dealloc__(self): 66 | if self._R is not NULL: 67 | for i in range(self._n): 68 | free(self._R[i].cnts) 69 | free(self._R) 70 | def __getattr__(self, key): 71 | if key == 'n': 72 | return self._n 73 | raise AttributeError(key) 74 | 75 | # python class to wrap Mask array in C 76 | # the class handles the memory allocation and deallocation 77 | cdef class Masks: 78 | cdef byte *_mask 79 | cdef siz _h 80 | cdef siz _w 81 | cdef siz _n 82 | 83 | def __cinit__(self, h, w, n): 84 | self._mask = malloc(h*w*n* sizeof(byte)) 85 | self._h = h 86 | self._w = w 87 | self._n = n 88 | # def __dealloc__(self): 89 | # the memory management of _mask has been passed to np.ndarray 90 | # it doesn't need to be freed here 91 | 92 | # called when passing into np.array() and return an np.ndarray in column-major order 93 | def __array__(self): 94 | cdef np.npy_intp shape[1] 95 | shape[0] = self._h*self._w*self._n 96 | # Create a 1D array, and reshape it to fortran/Matlab column-major array 97 | ndarray = np.PyArray_SimpleNewFromData(1, shape, np.NPY_UINT8, self._mask).reshape((self._h, self._w, self._n), order='F') 98 | # The _mask allocated by Masks is now handled by ndarray 99 | PyArray_ENABLEFLAGS(ndarray, np.NPY_OWNDATA) 100 | return ndarray 101 | 102 | # internal conversion from Python RLEs object to compressed RLE format 103 | def _toString(RLEs Rs): 104 | cdef siz n = Rs.n 105 | cdef bytes py_string 106 | cdef char* c_string 107 | objs = [] 108 | for i in range(n): 109 | c_string = rleToString( &Rs._R[i] ) 110 | py_string = c_string 111 | objs.append({ 112 | 'size': [Rs._R[i].h, Rs._R[i].w], 113 | 'counts': py_string 114 | }) 115 | free(c_string) 116 | return objs 117 | 118 | # internal conversion from compressed RLE format to Python RLEs object 119 | def _frString(rleObjs): 120 | cdef siz n = len(rleObjs) 121 | Rs = RLEs(n) 122 | cdef bytes py_string 123 | cdef char* c_string 124 | for i, obj in enumerate(rleObjs): 125 | if PYTHON_VERSION == 2: 126 | py_string = str(obj['counts']).encode('utf8') 127 | elif PYTHON_VERSION == 3: 128 | py_string = str.encode(obj['counts']) if type(obj['counts']) == str else obj['counts'] 129 | else: 130 | raise Exception('Python version must be 2 or 3') 131 | c_string = py_string 132 | rleFrString( &Rs._R[i], c_string, obj['size'][0], obj['size'][1] ) 133 | return Rs 134 | 135 | # encode mask to RLEs objects 136 | # list of RLE string can be generated by RLEs member function 137 | def encode(np.ndarray[np.uint8_t, ndim=3, mode='fortran'] mask): 138 | h, w, n = mask.shape[0], mask.shape[1], mask.shape[2] 139 | cdef RLEs Rs = RLEs(n) 140 | rleEncode(Rs._R,mask.data,h,w,n) 141 | objs = _toString(Rs) 142 | return objs 143 | 144 | # decode mask from compressed list of RLE string or RLEs object 145 | def decode(rleObjs): 146 | cdef RLEs Rs = _frString(rleObjs) 147 | h, w, n = Rs._R[0].h, Rs._R[0].w, Rs._n 148 | masks = Masks(h, w, n) 149 | rleDecode(Rs._R, masks._mask, n); 150 | return np.array(masks) 151 | 152 | def merge(rleObjs, intersect=0): 153 | cdef RLEs Rs = _frString(rleObjs) 154 | cdef RLEs R = RLEs(1) 155 | rleMerge(Rs._R, R._R, Rs._n, intersect) 156 | obj = _toString(R)[0] 157 | return obj 158 | 159 | def area(rleObjs): 160 | cdef RLEs Rs = _frString(rleObjs) 161 | cdef uint* _a = malloc(Rs._n* sizeof(uint)) 162 | rleArea(Rs._R, Rs._n, _a) 163 | cdef np.npy_intp shape[1] 164 | shape[0] = Rs._n 165 | a = np.array((Rs._n, ), dtype=np.uint8) 166 | a = np.PyArray_SimpleNewFromData(1, shape, np.NPY_UINT32, _a) 167 | PyArray_ENABLEFLAGS(a, np.NPY_OWNDATA) 168 | return a 169 | 170 | # iou computation. support function overload (RLEs-RLEs and bbox-bbox). 171 | def iou( dt, gt, pyiscrowd ): 172 | def _preproc(objs): 173 | if len(objs) == 0: 174 | return objs 175 | if type(objs) == np.ndarray: 176 | if len(objs.shape) == 1: 177 | objs = objs.reshape((objs[0], 1)) 178 | # check if it's Nx4 bbox 179 | if not len(objs.shape) == 2 or not objs.shape[1] == 4: 180 | raise Exception('numpy ndarray input is only for *bounding boxes* and should have Nx4 dimension') 181 | objs = objs.astype(np.double) 182 | elif type(objs) == list: 183 | # check if list is in box format and convert it to np.ndarray 184 | isbox = np.all(np.array([(len(obj)==4) and ((type(obj)==list) or (type(obj)==np.ndarray)) for obj in objs])) 185 | isrle = np.all(np.array([type(obj) == dict for obj in objs])) 186 | if isbox: 187 | objs = np.array(objs, dtype=np.double) 188 | if len(objs.shape) == 1: 189 | objs = objs.reshape((1,objs.shape[0])) 190 | elif isrle: 191 | objs = _frString(objs) 192 | else: 193 | raise Exception('list input can be bounding box (Nx4) or RLEs ([RLE])') 194 | else: 195 | raise Exception('unrecognized type. The following type: RLEs (rle), np.ndarray (box), and list (box) are supported.') 196 | return objs 197 | def _rleIou(RLEs dt, RLEs gt, np.ndarray[np.uint8_t, ndim=1] iscrowd, siz m, siz n, np.ndarray[np.double_t, ndim=1] _iou): 198 | rleIou( dt._R, gt._R, m, n, iscrowd.data, _iou.data ) 199 | def _bbIou(np.ndarray[np.double_t, ndim=2] dt, np.ndarray[np.double_t, ndim=2] gt, np.ndarray[np.uint8_t, ndim=1] iscrowd, siz m, siz n, np.ndarray[np.double_t, ndim=1] _iou): 200 | bbIou( dt.data, gt.data, m, n, iscrowd.data, _iou.data ) 201 | def _len(obj): 202 | cdef siz N = 0 203 | if type(obj) == RLEs: 204 | N = obj.n 205 | elif len(obj)==0: 206 | pass 207 | elif type(obj) == np.ndarray: 208 | N = obj.shape[0] 209 | return N 210 | # convert iscrowd to numpy array 211 | cdef np.ndarray[np.uint8_t, ndim=1] iscrowd = np.array(pyiscrowd, dtype=np.uint8) 212 | # simple type checking 213 | cdef siz m, n 214 | dt = _preproc(dt) 215 | gt = _preproc(gt) 216 | m = _len(dt) 217 | n = _len(gt) 218 | if m == 0 or n == 0: 219 | return [] 220 | if not type(dt) == type(gt): 221 | raise Exception('The dt and gt should have the same data type, either RLEs, list or np.ndarray') 222 | 223 | # define local variables 224 | cdef double* _iou = 0 225 | cdef np.npy_intp shape[1] 226 | # check type and assign iou function 227 | if type(dt) == RLEs: 228 | _iouFun = _rleIou 229 | elif type(dt) == np.ndarray: 230 | _iouFun = _bbIou 231 | else: 232 | raise Exception('input data type not allowed.') 233 | _iou = malloc(m*n* sizeof(double)) 234 | iou = np.zeros((m*n, ), dtype=np.double) 235 | shape[0] = m*n 236 | iou = np.PyArray_SimpleNewFromData(1, shape, np.NPY_DOUBLE, _iou) 237 | PyArray_ENABLEFLAGS(iou, np.NPY_OWNDATA) 238 | _iouFun(dt, gt, iscrowd, m, n, iou) 239 | return iou.reshape((m,n), order='F') 240 | 241 | def toBbox( rleObjs ): 242 | cdef RLEs Rs = _frString(rleObjs) 243 | cdef siz n = Rs.n 244 | cdef BB _bb = malloc(4*n* sizeof(double)) 245 | rleToBbox( Rs._R, _bb, n ) 246 | cdef np.npy_intp shape[1] 247 | shape[0] = 4*n 248 | bb = np.array((1,4*n), dtype=np.double) 249 | bb = np.PyArray_SimpleNewFromData(1, shape, np.NPY_DOUBLE, _bb).reshape((n, 4)) 250 | PyArray_ENABLEFLAGS(bb, np.NPY_OWNDATA) 251 | return bb 252 | 253 | def frBbox(np.ndarray[np.double_t, ndim=2] bb, siz h, siz w ): 254 | cdef siz n = bb.shape[0] 255 | Rs = RLEs(n) 256 | rleFrBbox( Rs._R, bb.data, h, w, n ) 257 | objs = _toString(Rs) 258 | return objs 259 | 260 | def frPoly( poly, siz h, siz w ): 261 | cdef np.ndarray[np.double_t, ndim=1] np_poly 262 | n = len(poly) 263 | Rs = RLEs(n) 264 | for i, p in enumerate(poly): 265 | np_poly = np.array(p, dtype=np.double, order='F') 266 | rleFrPoly( &Rs._R[i], np_poly.data, int(len(p)/2), h, w ) 267 | objs = _toString(Rs) 268 | return objs 269 | 270 | def frUncompressedRLE(ucRles, siz h, siz w): 271 | cdef np.ndarray[np.uint32_t, ndim=1] cnts 272 | cdef RLE R 273 | cdef uint *data 274 | n = len(ucRles) 275 | objs = [] 276 | for i in range(n): 277 | Rs = RLEs(1) 278 | cnts = np.array(ucRles[i]['counts'], dtype=np.uint32) 279 | # time for malloc can be saved here but it's fine 280 | data = malloc(len(cnts)* sizeof(uint)) 281 | for j in range(len(cnts)): 282 | data[j] = cnts[j] 283 | R = RLE(ucRles[i]['size'][0], ucRles[i]['size'][1], len(cnts), data) 284 | Rs._R[0] = R 285 | objs.append(_toString(Rs)[0]) 286 | return objs 287 | 288 | def frPyObjects(pyobj, h, w): 289 | # encode rle from a list of python objects 290 | if type(pyobj) == np.ndarray: 291 | objs = frBbox(pyobj, h, w) 292 | elif type(pyobj) == list and len(pyobj[0]) == 4: 293 | objs = frBbox(pyobj, h, w) 294 | elif type(pyobj) == list and len(pyobj[0]) > 4: 295 | objs = frPoly(pyobj, h, w) 296 | elif type(pyobj) == list and type(pyobj[0]) == dict \ 297 | and 'counts' in pyobj[0] and 'size' in pyobj[0]: 298 | objs = frUncompressedRLE(pyobj, h, w) 299 | # encode rle from single python object 300 | elif type(pyobj) == list and len(pyobj) == 4: 301 | objs = frBbox([pyobj], h, w)[0] 302 | elif type(pyobj) == list and len(pyobj) > 4: 303 | objs = frPoly([pyobj], h, w)[0] 304 | elif type(pyobj) == dict and 'counts' in pyobj and 'size' in pyobj: 305 | objs = frUncompressedRLE([pyobj], h, w)[0] 306 | else: 307 | raise Exception('input type is not supported.') 308 | return objs 309 | -------------------------------------------------------------------------------- /utils/pycocotools/coco.py: -------------------------------------------------------------------------------- 1 | __author__ = 'tylin' 2 | __version__ = '2.0' 3 | # Interface for accessing the Microsoft COCO dataset. 4 | 5 | # Microsoft COCO is a large image dataset designed for object detection, 6 | # segmentation, and caption generation. pycocotools is a Python API that 7 | # assists in loading, parsing and visualizing the annotations in COCO. 8 | # Please visit http://mscoco.org/ for more information on COCO, including 9 | # for the data, paper, and tutorials. The exact format of the annotations 10 | # is also described on the COCO website. For example usage of the pycocotools 11 | # please see pycocotools_demo.ipynb. In addition to this API, please download both 12 | # the COCO images and annotations in order to run the demo. 13 | 14 | # An alternative to using the API is to load the annotations directly 15 | # into Python dictionary 16 | # Using the API provides additional utility functions. Note that this API 17 | # supports both *instance* and *caption* annotations. In the case of 18 | # captions not all functions are defined (e.g. categories are undefined). 19 | 20 | # The following API functions are defined: 21 | # COCO - COCO api class that loads COCO annotation file and prepare data structures. 22 | # decodeMask - Decode binary mask M encoded via run-length encoding. 23 | # encodeMask - Encode binary mask M using run-length encoding. 24 | # getAnnIds - Get ann ids that satisfy given filter conditions. 25 | # getCatIds - Get cat ids that satisfy given filter conditions. 26 | # getImgIds - Get img ids that satisfy given filter conditions. 27 | # loadAnns - Load anns with the specified ids. 28 | # loadCats - Load cats with the specified ids. 29 | # loadImgs - Load imgs with the specified ids. 30 | # annToMask - Convert segmentation in an annotation to binary mask. 31 | # showAnns - Display the specified annotations. 32 | # loadRes - Load algorithm results and create API for accessing them. 33 | # download - Download COCO images from mscoco.org server. 34 | # Throughout the API "ann"=annotation, "cat"=category, and "img"=image. 35 | # Help on each functions can be accessed by: "help COCO>function". 36 | 37 | # See also COCO>decodeMask, 38 | # COCO>encodeMask, COCO>getAnnIds, COCO>getCatIds, 39 | # COCO>getImgIds, COCO>loadAnns, COCO>loadCats, 40 | # COCO>loadImgs, COCO>annToMask, COCO>showAnns 41 | 42 | # Microsoft COCO Toolbox. version 2.0 43 | # Data, paper, and tutorials available at: http://mscoco.org/ 44 | # Code written by Piotr Dollar and Tsung-Yi Lin, 2014. 45 | # Licensed under the Simplified BSD License [see bsd.txt] 46 | 47 | import json 48 | import time 49 | import matplotlib.pyplot as plt 50 | from matplotlib.collections import PatchCollection 51 | from matplotlib.patches import Polygon 52 | import numpy as np 53 | import copy 54 | import itertools 55 | from . import mask as maskUtils 56 | import os 57 | from collections import defaultdict 58 | import sys 59 | PYTHON_VERSION = sys.version_info[0] 60 | if PYTHON_VERSION == 2: 61 | from urllib import urlretrieve 62 | elif PYTHON_VERSION == 3: 63 | from urllib.request import urlretrieve 64 | 65 | class COCO: 66 | def __init__(self, annotation_file=None): 67 | """ 68 | Constructor of Microsoft COCO helper class for reading and visualizing annotations. 69 | :param annotation_file (str): location of annotation file 70 | :param image_folder (str): location to the folder that hosts images. 71 | :return: 72 | """ 73 | # load dataset 74 | self.dataset,self.anns,self.cats,self.imgs = dict(),dict(),dict(),dict() 75 | self.imgToAnns, self.catToImgs = defaultdict(list), defaultdict(list) 76 | if not annotation_file == None: 77 | print('loading annotations into memory...') 78 | tic = time.time() 79 | dataset = json.load(open(annotation_file, 'r')) 80 | assert type(dataset)==dict, 'annotation file format {} not supported'.format(type(dataset)) 81 | print('Done (t={:0.2f}s)'.format(time.time()- tic)) 82 | self.dataset = dataset 83 | self.createIndex() 84 | 85 | def createIndex(self): 86 | # create index 87 | print('creating index...') 88 | anns, cats, imgs = {}, {}, {} 89 | imgToAnns,catToImgs = defaultdict(list),defaultdict(list) 90 | if 'annotations' in self.dataset: 91 | for ann in self.dataset['annotations']: 92 | imgToAnns[ann['image_id']].append(ann) 93 | anns[ann['id']] = ann 94 | 95 | if 'images' in self.dataset: 96 | for img in self.dataset['images']: 97 | imgs[img['id']] = img 98 | 99 | if 'categories' in self.dataset: 100 | for cat in self.dataset['categories']: 101 | cats[cat['id']] = cat 102 | 103 | if 'annotations' in self.dataset and 'categories' in self.dataset: 104 | for ann in self.dataset['annotations']: 105 | catToImgs[ann['category_id']].append(ann['image_id']) 106 | 107 | print('index created!') 108 | 109 | # create class members 110 | self.anns = anns 111 | self.imgToAnns = imgToAnns 112 | self.catToImgs = catToImgs 113 | self.imgs = imgs 114 | self.cats = cats 115 | 116 | def info(self): 117 | """ 118 | Print information about the annotation file. 119 | :return: 120 | """ 121 | for key, value in self.dataset['info'].items(): 122 | print('{}: {}'.format(key, value)) 123 | 124 | def getAnnIds(self, imgIds=[], catIds=[], areaRng=[], iscrowd=None): 125 | """ 126 | Get ann ids that satisfy given filter conditions. default skips that filter 127 | :param imgIds (int array) : get anns for given imgs 128 | catIds (int array) : get anns for given cats 129 | areaRng (float array) : get anns for given area range (e.g. [0 inf]) 130 | iscrowd (boolean) : get anns for given crowd label (False or True) 131 | :return: ids (int array) : integer array of ann ids 132 | """ 133 | imgIds = imgIds if type(imgIds) == list else [imgIds] 134 | catIds = catIds if type(catIds) == list else [catIds] 135 | 136 | if len(imgIds) == len(catIds) == len(areaRng) == 0: 137 | anns = self.dataset['annotations'] 138 | else: 139 | if not len(imgIds) == 0: 140 | lists = [self.imgToAnns[imgId] for imgId in imgIds if imgId in self.imgToAnns] 141 | anns = list(itertools.chain.from_iterable(lists)) 142 | else: 143 | anns = self.dataset['annotations'] 144 | anns = anns if len(catIds) == 0 else [ann for ann in anns if ann['category_id'] in catIds] 145 | anns = anns if len(areaRng) == 0 else [ann for ann in anns if ann['area'] > areaRng[0] and ann['area'] < areaRng[1]] 146 | if not iscrowd == None: 147 | ids = [ann['id'] for ann in anns if ann['iscrowd'] == iscrowd] 148 | else: 149 | ids = [ann['id'] for ann in anns] 150 | return ids 151 | 152 | def getCatIds(self, catNms=[], supNms=[], catIds=[]): 153 | """ 154 | filtering parameters. default skips that filter. 155 | :param catNms (str array) : get cats for given cat names 156 | :param supNms (str array) : get cats for given supercategory names 157 | :param catIds (int array) : get cats for given cat ids 158 | :return: ids (int array) : integer array of cat ids 159 | """ 160 | catNms = catNms if type(catNms) == list else [catNms] 161 | supNms = supNms if type(supNms) == list else [supNms] 162 | catIds = catIds if type(catIds) == list else [catIds] 163 | 164 | if len(catNms) == len(supNms) == len(catIds) == 0: 165 | cats = self.dataset['categories'] 166 | else: 167 | cats = self.dataset['categories'] 168 | cats = cats if len(catNms) == 0 else [cat for cat in cats if cat['name'] in catNms] 169 | cats = cats if len(supNms) == 0 else [cat for cat in cats if cat['supercategory'] in supNms] 170 | cats = cats if len(catIds) == 0 else [cat for cat in cats if cat['id'] in catIds] 171 | ids = [cat['id'] for cat in cats] 172 | return ids 173 | 174 | def getImgIds(self, imgIds=[], catIds=[]): 175 | ''' 176 | Get img ids that satisfy given filter conditions. 177 | :param imgIds (int array) : get imgs for given ids 178 | :param catIds (int array) : get imgs with all given cats 179 | :return: ids (int array) : integer array of img ids 180 | ''' 181 | imgIds = imgIds if type(imgIds) == list else [imgIds] 182 | catIds = catIds if type(catIds) == list else [catIds] 183 | 184 | if len(imgIds) == len(catIds) == 0: 185 | ids = self.imgs.keys() 186 | else: 187 | ids = set(imgIds) 188 | for i, catId in enumerate(catIds): 189 | if i == 0 and len(ids) == 0: 190 | ids = set(self.catToImgs[catId]) 191 | else: 192 | ids &= set(self.catToImgs[catId]) 193 | return list(ids) 194 | 195 | def loadAnns(self, ids=[]): 196 | """ 197 | Load anns with the specified ids. 198 | :param ids (int array) : integer ids specifying anns 199 | :return: anns (object array) : loaded ann objects 200 | """ 201 | if type(ids) == list: 202 | return [self.anns[id] for id in ids] 203 | elif type(ids) == int: 204 | return [self.anns[ids]] 205 | 206 | def loadCats(self, ids=[]): 207 | """ 208 | Load cats with the specified ids. 209 | :param ids (int array) : integer ids specifying cats 210 | :return: cats (object array) : loaded cat objects 211 | """ 212 | if type(ids) == list: 213 | return [self.cats[id] for id in ids] 214 | elif type(ids) == int: 215 | return [self.cats[ids]] 216 | 217 | def loadImgs(self, ids=[]): 218 | """ 219 | Load anns with the specified ids. 220 | :param ids (int array) : integer ids specifying img 221 | :return: imgs (object array) : loaded img objects 222 | """ 223 | if type(ids) == list: 224 | return [self.imgs[id] for id in ids] 225 | elif type(ids) == int: 226 | return [self.imgs[ids]] 227 | 228 | def showAnns(self, anns): 229 | """ 230 | Display the specified annotations. 231 | :param anns (array of object): annotations to display 232 | :return: None 233 | """ 234 | if len(anns) == 0: 235 | return 0 236 | if 'segmentation' in anns[0] or 'keypoints' in anns[0]: 237 | datasetType = 'instances' 238 | elif 'caption' in anns[0]: 239 | datasetType = 'captions' 240 | else: 241 | raise Exception('datasetType not supported') 242 | if datasetType == 'instances': 243 | ax = plt.gca() 244 | ax.set_autoscale_on(False) 245 | polygons = [] 246 | color = [] 247 | for ann in anns: 248 | c = (np.random.random((1, 3))*0.6+0.4).tolist()[0] 249 | if 'segmentation' in ann: 250 | if type(ann['segmentation']) == list: 251 | # polygon 252 | for seg in ann['segmentation']: 253 | poly = np.array(seg).reshape((int(len(seg)/2), 2)) 254 | polygons.append(Polygon(poly)) 255 | color.append(c) 256 | else: 257 | # mask 258 | t = self.imgs[ann['image_id']] 259 | if type(ann['segmentation']['counts']) == list: 260 | rle = maskUtils.frPyObjects([ann['segmentation']], t['height'], t['width']) 261 | else: 262 | rle = [ann['segmentation']] 263 | m = maskUtils.decode(rle) 264 | img = np.ones( (m.shape[0], m.shape[1], 3) ) 265 | if ann['iscrowd'] == 1: 266 | color_mask = np.array([2.0,166.0,101.0])/255 267 | if ann['iscrowd'] == 0: 268 | color_mask = np.random.random((1, 3)).tolist()[0] 269 | for i in range(3): 270 | img[:,:,i] = color_mask[i] 271 | ax.imshow(np.dstack( (img, m*0.5) )) 272 | if 'keypoints' in ann and type(ann['keypoints']) == list: 273 | # turn skeleton into zero-based index 274 | sks = np.array(self.loadCats(ann['category_id'])[0]['skeleton'])-1 275 | kp = np.array(ann['keypoints']) 276 | x = kp[0::3] 277 | y = kp[1::3] 278 | v = kp[2::3] 279 | for sk in sks: 280 | if np.all(v[sk]>0): 281 | plt.plot(x[sk],y[sk], linewidth=3, color=c) 282 | plt.plot(x[v>0], y[v>0],'o',markersize=8, markerfacecolor=c, markeredgecolor='k',markeredgewidth=2) 283 | plt.plot(x[v>1], y[v>1],'o',markersize=8, markerfacecolor=c, markeredgecolor=c, markeredgewidth=2) 284 | p = PatchCollection(polygons, facecolor=color, linewidths=0, alpha=0.4) 285 | ax.add_collection(p) 286 | p = PatchCollection(polygons, facecolor='none', edgecolors=color, linewidths=2) 287 | ax.add_collection(p) 288 | elif datasetType == 'captions': 289 | for ann in anns: 290 | print(ann['caption']) 291 | 292 | def loadRes(self, resFile): 293 | """ 294 | Load result file and return a result api object. 295 | :param resFile (str) : file name of result file 296 | :return: res (obj) : result api object 297 | """ 298 | res = COCO() 299 | res.dataset['images'] = [img for img in self.dataset['images']] 300 | 301 | print('Loading and preparing results...') 302 | tic = time.time() 303 | if type(resFile) == str or type(resFile) == unicode: 304 | anns = json.load(open(resFile)) 305 | elif type(resFile) == np.ndarray: 306 | anns = self.loadNumpyAnnotations(resFile) 307 | else: 308 | anns = resFile 309 | assert type(anns) == list, 'results in not an array of objects' 310 | annsImgIds = [ann['image_id'] for ann in anns] 311 | assert set(annsImgIds) == (set(annsImgIds) & set(self.getImgIds())), \ 312 | 'Results do not correspond to current coco set' 313 | if 'caption' in anns[0]: 314 | imgIds = set([img['id'] for img in res.dataset['images']]) & set([ann['image_id'] for ann in anns]) 315 | res.dataset['images'] = [img for img in res.dataset['images'] if img['id'] in imgIds] 316 | for id, ann in enumerate(anns): 317 | ann['id'] = id+1 318 | elif 'bbox' in anns[0] and not anns[0]['bbox'] == []: 319 | res.dataset['categories'] = copy.deepcopy(self.dataset['categories']) 320 | for id, ann in enumerate(anns): 321 | bb = ann['bbox'] 322 | x1, x2, y1, y2 = [bb[0], bb[0]+bb[2], bb[1], bb[1]+bb[3]] 323 | if not 'segmentation' in ann: 324 | ann['segmentation'] = [[x1, y1, x1, y2, x2, y2, x2, y1]] 325 | ann['area'] = bb[2]*bb[3] 326 | ann['id'] = id+1 327 | ann['iscrowd'] = 0 328 | elif 'segmentation' in anns[0]: 329 | res.dataset['categories'] = copy.deepcopy(self.dataset['categories']) 330 | for id, ann in enumerate(anns): 331 | # now only support compressed RLE format as segmentation results 332 | ann['area'] = maskUtils.area(ann['segmentation']) 333 | if not 'bbox' in ann: 334 | ann['bbox'] = maskUtils.toBbox(ann['segmentation']) 335 | ann['id'] = id+1 336 | ann['iscrowd'] = 0 337 | elif 'keypoints' in anns[0]: 338 | res.dataset['categories'] = copy.deepcopy(self.dataset['categories']) 339 | for id, ann in enumerate(anns): 340 | s = ann['keypoints'] 341 | x = s[0::3] 342 | y = s[1::3] 343 | x0,x1,y0,y1 = np.min(x), np.max(x), np.min(y), np.max(y) 344 | ann['area'] = (x1-x0)*(y1-y0) 345 | ann['id'] = id + 1 346 | ann['bbox'] = [x0,y0,x1-x0,y1-y0] 347 | print('DONE (t={:0.2f}s)'.format(time.time()- tic)) 348 | 349 | res.dataset['annotations'] = anns 350 | res.createIndex() 351 | return res 352 | 353 | def download(self, tarDir = None, imgIds = [] ): 354 | ''' 355 | Download COCO images from mscoco.org server. 356 | :param tarDir (str): COCO results directory name 357 | imgIds (list): images to be downloaded 358 | :return: 359 | ''' 360 | if tarDir is None: 361 | print('Please specify target directory') 362 | return -1 363 | if len(imgIds) == 0: 364 | imgs = self.imgs.values() 365 | else: 366 | imgs = self.loadImgs(imgIds) 367 | N = len(imgs) 368 | if not os.path.exists(tarDir): 369 | os.makedirs(tarDir) 370 | for i, img in enumerate(imgs): 371 | tic = time.time() 372 | fname = os.path.join(tarDir, img['file_name']) 373 | if not os.path.exists(fname): 374 | urlretrieve(img['coco_url'], fname) 375 | print('downloaded {}/{} images (t={:0.1f}s)'.format(i, N, time.time()- tic)) 376 | 377 | def loadNumpyAnnotations(self, data): 378 | """ 379 | Convert result data from a numpy array [Nx7] where each row contains {imageID,x1,y1,w,h,score,class} 380 | :param data (numpy.ndarray) 381 | :return: annotations (python nested list) 382 | """ 383 | print('Converting ndarray to lists...') 384 | assert(type(data) == np.ndarray) 385 | print(data.shape) 386 | assert(data.shape[1] == 7) 387 | N = data.shape[0] 388 | ann = [] 389 | for i in range(N): 390 | if i % 1000000 == 0: 391 | print('{}/{}'.format(i,N)) 392 | ann += [{ 393 | 'image_id' : int(data[i, 0]), 394 | 'bbox' : [ data[i, 1], data[i, 2], data[i, 3], data[i, 4] ], 395 | 'score' : data[i, 5], 396 | 'category_id': int(data[i, 6]), 397 | }] 398 | return ann 399 | 400 | def annToRLE(self, ann): 401 | """ 402 | Convert annotation which can be polygons, uncompressed RLE to RLE. 403 | :return: binary mask (numpy 2D array) 404 | """ 405 | t = self.imgs[ann['image_id']] 406 | h, w = t['height'], t['width'] 407 | segm = ann['segmentation'] 408 | if type(segm) == list: 409 | # polygon -- a single object might consist of multiple parts 410 | # we merge all parts into one mask rle code 411 | rles = maskUtils.frPyObjects(segm, h, w) 412 | rle = maskUtils.merge(rles) 413 | elif type(segm['counts']) == list: 414 | # uncompressed RLE 415 | rle = maskUtils.frPyObjects(segm, h, w) 416 | else: 417 | # rle 418 | rle = ann['segmentation'] 419 | return rle 420 | 421 | def annToMask(self, ann): 422 | """ 423 | Convert annotation which can be polygons, uncompressed RLE, or RLE to binary mask. 424 | :return: binary mask (numpy 2D array) 425 | """ 426 | rle = self.annToRLE(ann) 427 | m = maskUtils.decode(rle) 428 | return m -------------------------------------------------------------------------------- /utils/pycocotools/mask.py: -------------------------------------------------------------------------------- 1 | __author__ = 'tsungyi' 2 | 3 | #import pycocotools._mask as _mask 4 | from . import _mask 5 | 6 | # Interface for manipulating masks stored in RLE format. 7 | # 8 | # RLE is a simple yet efficient format for storing binary masks. RLE 9 | # first divides a vector (or vectorized image) into a series of piecewise 10 | # constant regions and then for each piece simply stores the length of 11 | # that piece. For example, given M=[0 0 1 1 1 0 1] the RLE counts would 12 | # be [2 3 1 1], or for M=[1 1 1 1 1 1 0] the counts would be [0 6 1] 13 | # (note that the odd counts are always the numbers of zeros). Instead of 14 | # storing the counts directly, additional compression is achieved with a 15 | # variable bitrate representation based on a common scheme called LEB128. 16 | # 17 | # Compression is greatest given large piecewise constant regions. 18 | # Specifically, the size of the RLE is proportional to the number of 19 | # *boundaries* in M (or for an image the number of boundaries in the y 20 | # direction). Assuming fairly simple shapes, the RLE representation is 21 | # O(sqrt(n)) where n is number of pixels in the object. Hence space usage 22 | # is substantially lower, especially for large simple objects (large n). 23 | # 24 | # Many common operations on masks can be computed directly using the RLE 25 | # (without need for decoding). This includes computations such as area, 26 | # union, intersection, etc. All of these operations are linear in the 27 | # size of the RLE, in other words they are O(sqrt(n)) where n is the area 28 | # of the object. Computing these operations on the original mask is O(n). 29 | # Thus, using the RLE can result in substantial computational savings. 30 | # 31 | # The following API functions are defined: 32 | # encode - Encode binary masks using RLE. 33 | # decode - Decode binary masks encoded via RLE. 34 | # merge - Compute union or intersection of encoded masks. 35 | # iou - Compute intersection over union between masks. 36 | # area - Compute area of encoded masks. 37 | # toBbox - Get bounding boxes surrounding encoded masks. 38 | # frPyObjects - Convert polygon, bbox, and uncompressed RLE to encoded RLE mask. 39 | # 40 | # Usage: 41 | # Rs = encode( masks ) 42 | # masks = decode( Rs ) 43 | # R = merge( Rs, intersect=false ) 44 | # o = iou( dt, gt, iscrowd ) 45 | # a = area( Rs ) 46 | # bbs = toBbox( Rs ) 47 | # Rs = frPyObjects( [pyObjects], h, w ) 48 | # 49 | # In the API the following formats are used: 50 | # Rs - [dict] Run-length encoding of binary masks 51 | # R - dict Run-length encoding of binary mask 52 | # masks - [hxwxn] Binary mask(s) (must have type np.ndarray(dtype=uint8) in column-major order) 53 | # iscrowd - [nx1] list of np.ndarray. 1 indicates corresponding gt image has crowd region to ignore 54 | # bbs - [nx4] Bounding box(es) stored as [x y w h] 55 | # poly - Polygon stored as [[x1 y1 x2 y2...],[x1 y1 ...],...] (2D list) 56 | # dt,gt - May be either bounding boxes or encoded masks 57 | # Both poly and bbs are 0-indexed (bbox=[0 0 1 1] encloses first pixel). 58 | # 59 | # Finally, a note about the intersection over union (iou) computation. 60 | # The standard iou of a ground truth (gt) and detected (dt) object is 61 | # iou(gt,dt) = area(intersect(gt,dt)) / area(union(gt,dt)) 62 | # For "crowd" regions, we use a modified criteria. If a gt object is 63 | # marked as "iscrowd", we allow a dt to match any subregion of the gt. 64 | # Choosing gt' in the crowd gt that best matches the dt can be done using 65 | # gt'=intersect(dt,gt). Since by definition union(gt',dt)=dt, computing 66 | # iou(gt,dt,iscrowd) = iou(gt',dt) = area(intersect(gt,dt)) / area(dt) 67 | # For crowd gt regions we use this modified criteria above for the iou. 68 | # 69 | # To compile run "python setup.py build_ext --inplace" 70 | # Please do not contact us for help with compiling. 71 | # 72 | # Microsoft COCO Toolbox. version 2.0 73 | # Data, paper, and tutorials available at: http://mscoco.org/ 74 | # Code written by Piotr Dollar and Tsung-Yi Lin, 2015. 75 | # Licensed under the Simplified BSD License [see coco/license.txt] 76 | 77 | iou = _mask.iou 78 | merge = _mask.merge 79 | frPyObjects = _mask.frPyObjects 80 | 81 | def encode(bimask): 82 | if len(bimask.shape) == 3: 83 | return _mask.encode(bimask) 84 | elif len(bimask.shape) == 2: 85 | h, w = bimask.shape 86 | return _mask.encode(bimask.reshape((h, w, 1), order='F'))[0] 87 | 88 | def decode(rleObjs): 89 | if type(rleObjs) == list: 90 | return _mask.decode(rleObjs) 91 | else: 92 | return _mask.decode([rleObjs])[:,:,0] 93 | 94 | def area(rleObjs): 95 | if type(rleObjs) == list: 96 | return _mask.area(rleObjs) 97 | else: 98 | return _mask.area([rleObjs])[0] 99 | 100 | def toBbox(rleObjs): 101 | if type(rleObjs) == list: 102 | return _mask.toBbox(rleObjs) 103 | else: 104 | return _mask.toBbox([rleObjs])[0] 105 | -------------------------------------------------------------------------------- /utils/pycocotools/maskApi.c: -------------------------------------------------------------------------------- 1 | /************************************************************************** 2 | * Microsoft COCO Toolbox. version 2.0 3 | * Data, paper, and tutorials available at: http://mscoco.org/ 4 | * Code written by Piotr Dollar and Tsung-Yi Lin, 2015. 5 | * Licensed under the Simplified BSD License [see coco/license.txt] 6 | **************************************************************************/ 7 | #include "maskApi.h" 8 | #include 9 | #include 10 | 11 | uint umin( uint a, uint b ) { return (ab) ? a : b; } 13 | 14 | void rleInit( RLE *R, siz h, siz w, siz m, uint *cnts ) { 15 | R->h=h; R->w=w; R->m=m; R->cnts=(m==0)?0:malloc(sizeof(uint)*m); 16 | siz j; if(cnts) for(j=0; jcnts[j]=cnts[j]; 17 | } 18 | 19 | void rleFree( RLE *R ) { 20 | free(R->cnts); R->cnts=0; 21 | } 22 | 23 | void rlesInit( RLE **R, siz n ) { 24 | siz i; *R = (RLE*) malloc(sizeof(RLE)*n); 25 | for(i=0; i0 ) { 61 | c=umin(ca,cb); cc+=c; ct=0; 62 | ca-=c; if(!ca && a0) { 83 | crowd=iscrowd!=NULL && iscrowd[g]; 84 | if(dt[d].h!=gt[g].h || dt[d].w!=gt[g].w) { o[g*m+d]=-1; continue; } 85 | siz ka, kb, a, b; uint c, ca, cb, ct, i, u; int va, vb; 86 | ca=dt[d].cnts[0]; ka=dt[d].m; va=vb=0; 87 | cb=gt[g].cnts[0]; kb=gt[g].m; a=b=1; i=u=0; ct=1; 88 | while( ct>0 ) { 89 | c=umin(ca,cb); if(va||vb) { u+=c; if(va&&vb) i+=c; } ct=0; 90 | ca-=c; if(!ca && athr) keep[j]=0; 105 | } 106 | } 107 | } 108 | 109 | void bbIou( BB dt, BB gt, siz m, siz n, byte *iscrowd, double *o ) { 110 | double h, w, i, u, ga, da; siz g, d; int crowd; 111 | for( g=0; gthr) keep[j]=0; 129 | } 130 | } 131 | } 132 | 133 | void rleToBbox( const RLE *R, BB bb, siz n ) { 134 | siz i; for( i=0; id?1:c=dy && xs>xe) || (dxye); 173 | if(flip) { t=xs; xs=xe; xe=t; t=ys; ys=ye; ye=t; } 174 | s = dx>=dy ? (double)(ye-ys)/dx : (double)(xe-xs)/dy; 175 | if(dx>=dy) for( d=0; d<=dx; d++ ) { 176 | t=flip?dx-d:d; u[m]=t+xs; v[m]=(int)(ys+s*t+.5); m++; 177 | } else for( d=0; d<=dy; d++ ) { 178 | t=flip?dy-d:d; v[m]=t+ys; u[m]=(int)(xs+s*t+.5); m++; 179 | } 180 | } 181 | /* get points along y-boundary and downsample */ 182 | free(x); free(y); k=m; m=0; double xd, yd; 183 | x=malloc(sizeof(int)*k); y=malloc(sizeof(int)*k); 184 | for( j=1; jw-1 ) continue; 187 | yd=(double)(v[j]h) yd=h; yd=ceil(yd); 189 | x[m]=(int) xd; y[m]=(int) yd; m++; 190 | } 191 | /* compute rle encoding given y-boundary points */ 192 | k=m; a=malloc(sizeof(uint)*(k+1)); 193 | for( j=0; j0) b[m++]=a[j++]; else { 199 | j++; if(jm, p=0; long x; int more; 206 | char *s=malloc(sizeof(char)*m*6); 207 | for( i=0; icnts[i]; if(i>2) x-=(long) R->cnts[i-2]; more=1; 209 | while( more ) { 210 | char c=x & 0x1f; x >>= 5; more=(c & 0x10) ? x!=-1 : x!=0; 211 | if(more) c |= 0x20; c+=48; s[p++]=c; 212 | } 213 | } 214 | s[p]=0; return s; 215 | } 216 | 217 | void rleFrString( RLE *R, char *s, siz h, siz w ) { 218 | siz m=0, p=0, k; long x; int more; uint *cnts; 219 | while( s[m] ) m++; cnts=malloc(sizeof(uint)*m); m=0; 220 | while( s[p] ) { 221 | x=0; k=0; more=1; 222 | while( more ) { 223 | char c=s[p]-48; x |= (c & 0x1f) << 5*k; 224 | more = c & 0x20; p++; k++; 225 | if(!more && (c & 0x10)) x |= -1 << 5*k; 226 | } 227 | if(m>2) x+=(long) cnts[m-2]; cnts[m++]=(uint) x; 228 | } 229 | rleInit(R,h,w,m,cnts); free(cnts); 230 | } 231 | -------------------------------------------------------------------------------- /utils/pycocotools/maskApi.h: -------------------------------------------------------------------------------- 1 | /************************************************************************** 2 | * Microsoft COCO Toolbox. version 2.0 3 | * Data, paper, and tutorials available at: http://mscoco.org/ 4 | * Code written by Piotr Dollar and Tsung-Yi Lin, 2015. 5 | * Licensed under the Simplified BSD License [see coco/license.txt] 6 | **************************************************************************/ 7 | #pragma once 8 | 9 | typedef unsigned int uint; 10 | typedef unsigned long siz; 11 | typedef unsigned char byte; 12 | typedef double* BB; 13 | typedef struct { siz h, w, m; uint *cnts; } RLE; 14 | 15 | /* Initialize/destroy RLE. */ 16 | void rleInit( RLE *R, siz h, siz w, siz m, uint *cnts ); 17 | void rleFree( RLE *R ); 18 | 19 | /* Initialize/destroy RLE array. */ 20 | void rlesInit( RLE **R, siz n ); 21 | void rlesFree( RLE **R, siz n ); 22 | 23 | /* Encode binary masks using RLE. */ 24 | void rleEncode( RLE *R, const byte *mask, siz h, siz w, siz n ); 25 | 26 | /* Decode binary masks encoded via RLE. */ 27 | void rleDecode( const RLE *R, byte *mask, siz n ); 28 | 29 | /* Compute union or intersection of encoded masks. */ 30 | void rleMerge( const RLE *R, RLE *M, siz n, int intersect ); 31 | 32 | /* Compute area of encoded masks. */ 33 | void rleArea( const RLE *R, siz n, uint *a ); 34 | 35 | /* Compute intersection over union between masks. */ 36 | void rleIou( RLE *dt, RLE *gt, siz m, siz n, byte *iscrowd, double *o ); 37 | 38 | /* Compute non-maximum suppression between bounding masks */ 39 | void rleNms( RLE *dt, siz n, uint *keep, double thr ); 40 | 41 | /* Compute intersection over union between bounding boxes. */ 42 | void bbIou( BB dt, BB gt, siz m, siz n, byte *iscrowd, double *o ); 43 | 44 | /* Compute non-maximum suppression between bounding boxes */ 45 | void bbNms( BB dt, siz n, uint *keep, double thr ); 46 | 47 | /* Get bounding boxes surrounding encoded masks. */ 48 | void rleToBbox( const RLE *R, BB bb, siz n ); 49 | 50 | /* Convert bounding boxes to encoded masks. */ 51 | void rleFrBbox( RLE *R, const BB bb, siz h, siz w, siz n ); 52 | 53 | /* Convert polygon to encoded mask. */ 54 | void rleFrPoly( RLE *R, const double *xy, siz k, siz h, siz w ); 55 | 56 | /* Get compressed string representation of encoded mask. */ 57 | char* rleToString( const RLE *R ); 58 | 59 | /* Convert from compressed string representation of encoded mask. */ 60 | void rleFrString( RLE *R, char *s, siz h, siz w ); 61 | -------------------------------------------------------------------------------- /utils/timer.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | import time 9 | 10 | 11 | class Timer(object): 12 | """A simple timer.""" 13 | def __init__(self): 14 | self.total_time = 0. 15 | self.calls = 0 16 | self.start_time = 0. 17 | self.diff = 0. 18 | self.average_time = 0. 19 | 20 | def tic(self): 21 | # using time.time instead of time.clock because time time.clock 22 | # does not normalize for multithreading 23 | self.start_time = time.time() 24 | 25 | def toc(self, average=True): 26 | self.diff = time.time() - self.start_time 27 | self.total_time += self.diff 28 | self.calls += 1 29 | self.average_time = self.total_time / self.calls 30 | if average: 31 | return self.average_time 32 | else: 33 | return self.diff 34 | 35 | def clear(self): 36 | self.total_time = 0. 37 | self.calls = 0 38 | self.start_time = 0. 39 | self.diff = 0. 40 | self.average_time = 0. 41 | --------------------------------------------------------------------------------