├── .gitignore
├── LICENSE
├── README.md
├── convert_darknet.py
├── data
    ├── coco.py
    ├── config.py
    ├── data_augment.py
    ├── voc0712.py
    └── voc_eval.py
├── demo.py
├── eval.py
├── images
    ├── dog.jpg
    ├── eagle.jpg
    └── person.jpg
├── layers
    ├── multiyolo_loss.py
    ├── weight_mseloss.py
    ├── yolo_layer.py
    └── yolo_loss.py
├── make.sh
├── model
    ├── __init__.py
    ├── darknet53.py
    └── yolo.py
├── output
    ├── output_dog.jpg
    ├── output_eagle.jpg
    └── output_person.jpg
├── train.py
└── utils
    ├── box_utils.py
    ├── build.py
    ├── gen_anchors.py
    ├── nms
        ├── __init__.py
        ├── cpu_nms.pyx
        ├── gpu_nms.hpp
        ├── gpu_nms.pyx
        ├── nms_kernel.cu
        └── py_cpu_nms.py
    ├── nms_wrapper.py
    ├── preprocess.py
    ├── pycocotools
        ├── __init__.py
        ├── _mask.pyx
        ├── coco.py
        ├── cocoeval.py
        ├── mask.py
        ├── maskApi.c
        └── maskApi.h
    └── timer.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | # Byte-compiled / optimized / DLL files
 2 | __pycache__/
 3 | *.py[cod]
 4 | *$py.class
 5 | 
 6 | # C extensions
 7 | *.so
 8 | 
 9 | # Distribution / packaging
10 | .Python
11 | env/
12 | build/
13 | develop-eggs/
14 | dist/
15 | downloads/
16 | eggs/
17 | .eggs/
18 | lib/
19 | lib64/
20 | parts/
21 | sdist/
22 | var/
23 | *.egg-info/
24 | .installed.cfg
25 | *.egg
26 | 
27 | *log
28 | *.json
29 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2017 Max deGroot, Ellis Brown
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ## YOLO v3 implementation With pytorch 
  2 | > this repository only contain the detection module and we don't need the cfg from original YOLOv3, we implement it with pytorch.
  3 | 
  4 | This repository is based on the official code of [YOLOv3](https://github.com/pjreddie/darknet) and [pytorch-yolo-v3](https://github.com/ayooshkathuria/pytorch-yolo-v3). There's also an implementation for YOLOv3 already for pytorch, but it uses a config file rather than a normal pytorch approch to defining the network. One of the goals of this repository is to remove the cfg file.
  5 | 
  6 | ## Requirements
  7 | 
  8 | * Python 3.5
  9 | * OpenCV
 10 | * PyTorch 0.4
 11 | 
 12 | ## Installation
 13 | 
 14 | * Install PyTorch-0.4.0 by selecting your environment on the website and running the appropriate command.
 15 | * Clone this repository
 16 | * Compile the nms
 17 | * convert yolov3.weights to pytorch
 18 | 
 19 | ```shell
 20 | cd YOLOv3_Pytorch
 21 | ./make.sh
 22 | 
 23 | mkdir weights
 24 | cd weights
 25 | wget https://pjreddie.com/media/files/yolov3.weights
 26 | cd ..
 27 | python convert_darknet.py --version coco --weights ./weights/yolov3.weights --save_name ./weights/convert_yolov3_coco.pth
 28 | # we will get the convert_yolov3_coco.pth
 29 | ```
 30 | 
 31 | ## Train
 32 | > We only train voc dataset because we don't have enough gpus to train coco datatset. This is still an experimental repository, we don't reproduce the original results very well.
 33 | 
 34 | ### dataset
 35 | [merge VOC dataset](https://github.com/yqyao/DRFNet#voc-dataset)
 36 | 
 37 | * structure
 38 | 
 39 | ./data/datasets/VOCdevkit0712/VOC0712/Annotations  
 40 | ./data/datasets/VOCdevkit0712/VOC0712/ImageSets  
 41 | ./data/datasets/VOCdevkit0712/VOC0712/JPEGImages  
 42 | 
 43 | * COCO 
 44 | 
 45 | Same with [COCO](https://github.com/yqyao/DRFNet#coco-dataset)
 46 | 
 47 | ### train
 48 | > you can train multiscale by changing data/config voc_config multiscale
 49 | 
 50 | * convert weights
 51 | ```shell
 52 | cd weights
 53 | wget wget https://pjreddie.com/media/files/darknet53.conv.74
 54 | cd ../
 55 | python convert_darknet.py --version darknet53 --weights ./weights/darknet53.conv.74 --save_name ./weights/convert_darknet53.pth
 56 | ```
 57 | 
 58 | * train yolov3
 59 | 
 60 | ```python
 61 | python train.py --input_wh 416 416 -b 64 --subdivisions 4 -d VOC --basenet ./weights/convert_darknet53.pth
 62 | 
 63 | ```
 64 | 
 65 | ### eval
 66 | 
 67 | ```python
 68 | 
 69 | python eval.py --weights ./weights/convert_yolov3_voc.pth --dataset VOC --input_wh 416 416
 70 | ```
 71 | > darknet voc is trained by darknet, pytorch voc is trained by this repository
 72 | 
 73 | **results**
 74 | 
 75 | | darknet voc 608 | darknet voc 416 | pytorch voc 608| pytorch voc 416|
 76 | |:-:              |:-:              | :-:            |:-:             |
 77 | | 77.2 %          |      76.2%      |      74.7%     |          74.9% |
 78 | |       27ms      |       18ms      |        27ms    |       18ms     |
 79 | 
 80 | ## Demo
 81 | 
 82 | ```python
 83 | 
 84 | python demo.py --images images --save_path ./output --weights ./weights/convert_yolov3_coco.pth -d COCO
 85 | 
 86 | ```
 87 | 
 88 | ## Example
 89 | <img align="center" src= "https://github.com/yqyao/YOLOv3_Pytorch/blob/master/output/output_person.jpg">
 90 | <!-- ![](https://github.com/yqyao/YOLOv3_Pytorch.git/output/output_person.jpg) -->
 91 | 
 92 | 
 93 | ## References
 94 | - [YOLOv3: An Incremental Improvemet](https://pjreddie.com/media/files/papers/YOLOv3.pdf)
 95 | 
 96 | - [Original Implementation (Darknet)](https://github.com/pjreddie/darknet)
 97 | 
 98 | - [pytorch-yolo-v3](https://github.com/ayooshkathuria/pytorch-yolo-v3)
 99 | 
100 | - [pytorch-yolo2](https://github.com/marvis/pytorch-yolo2)
101 | 


--------------------------------------------------------------------------------
/convert_darknet.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | # Written by yq_yao
  3 | # 
  4 | import torch
  5 | import torch.nn as nn
  6 | import torch.nn.functional as F
  7 | from torch.autograd import Variable
  8 | import numpy as np
  9 | from data.config import voc_config, coco_config
 10 | from model.yolo import Yolov3
 11 | from model.darknet53 import Darknet53
 12 | import argparse
 13 | import os
 14 | 
 15 | def copy_weights(bn, conv, ptr, weights, use_bn=True):
 16 |     if use_bn:
 17 |         num_bn_biases = bn.bias.numel()
 18 |         
 19 |         #Load the weights
 20 |         bn_biases = torch.from_numpy(weights[ptr:ptr + num_bn_biases])
 21 |         ptr += num_bn_biases
 22 |         
 23 |         bn_weights = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
 24 |         ptr  += num_bn_biases
 25 |         
 26 |         bn_running_mean = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
 27 |         ptr  += num_bn_biases
 28 |         
 29 |         bn_running_var = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
 30 |         ptr  += num_bn_biases
 31 |         
 32 |         #Cast the loaded weights into dims of model weights. 
 33 |         bn_biases = bn_biases.view_as(bn.bias.data)
 34 |         bn_weights = bn_weights.view_as(bn.weight.data)
 35 |         bn_running_mean = bn_running_mean.view_as(bn.running_mean)
 36 |         bn_running_var = bn_running_var.view_as(bn.running_var)
 37 | 
 38 |         #Copy the data to model
 39 |         bn.bias.data.copy_(bn_biases)
 40 |         bn.weight.data.copy_(bn_weights)
 41 |         bn.running_mean.copy_(bn_running_mean)
 42 |         bn.running_var.copy_(bn_running_var)
 43 |     else:
 44 |         #Number of biases
 45 |         num_biases = conv.bias.numel()
 46 |     
 47 |         #Load the weights
 48 |         conv_biases = torch.from_numpy(weights[ptr: ptr + num_biases])
 49 |         ptr = ptr + num_biases
 50 |         
 51 |         #reshape the loaded weights according to the dims of the model weights
 52 |         conv_biases = conv_biases.view_as(conv.bias.data)
 53 |         
 54 |         #Finally copy the data
 55 |         conv.bias.data.copy_(conv_biases)
 56 |     
 57 |     #Let us load the weights for the Convolutional layers
 58 |     num_weights = conv.weight.numel()
 59 |     conv_weights = torch.from_numpy(weights[ptr:ptr+num_weights])
 60 |     ptr = ptr + num_weights
 61 | 
 62 |     conv_weights = conv_weights.view_as(conv.weight.data)
 63 |     conv.weight.data.copy_(conv_weights)
 64 |     return ptr
 65 | 
 66 | def load_weights_darknet53(weightfile, yolov3):
 67 |     fp = open(weightfile, "rb")
 68 |     #The first 5 values are header information 
 69 |     # 1. Major version number
 70 |     # 2. Minor Version Number
 71 |     # 3. Subversion number 
 72 |     # 4. IMages seen 
 73 |     header = np.fromfile(fp, dtype = np.int32, count = 5)
 74 |     weights = np.fromfile(fp, dtype = np.float32)
 75 |     print(len(weights))
 76 |     ptr = 0
 77 |     first_conv = yolov3.conv
 78 |     bn = first_conv.bn
 79 |     conv = first_conv.conv
 80 |     # first conv copy
 81 |     ptr = copy_weights(bn, conv, ptr, weights)
 82 | 
 83 |     layers = [yolov3.layer1, yolov3.layer2, yolov3.layer3, yolov3.layer4, yolov3.layer5]
 84 |     for layer in layers:
 85 |         for i in range(len(layer)):
 86 |             if i == 0:
 87 |                 bn = layer[i].bn
 88 |                 conv = layer[i].conv
 89 |                 ptr = copy_weights(bn, conv, ptr, weights)
 90 |             else:
 91 |                 bn = layer[i].conv1.bn
 92 |                 conv = layer[i].conv1.conv
 93 |                 ptr = copy_weights(bn, conv, ptr, weights)
 94 |                 bn = layer[i].conv2.bn
 95 |                 conv = layer[i].conv2.conv
 96 |                 ptr = copy_weights(bn, conv, ptr, weights)
 97 |     print(ptr)
 98 |     fp.close()
 99 | 
100 | def load_weights(weightfile, yolov3, version):
101 |     if version == "voc" or version == "coco":
102 |         load_weights_yolov3(weightfile, yolov3)
103 |     elif version == "darknet53":
104 |         load_weights_darknet53(weightfile, yolov3)          
105 | 
106 | def load_weights_yolov3(weightfile, yolov3):
107 |     fp = open(weightfile, "rb")
108 |     #The first 5 values are header information 
109 |     # 1. Major version number
110 |     # 2. Minor Version Number
111 |     # 3. Subversion number 
112 |     # 4, 5. IMages seen 
113 |     header = np.fromfile(fp, dtype = np.int32, count = 5)
114 |     weights = np.fromfile(fp, dtype = np.float32)
115 |     print(len(weights))
116 |     ptr = 0
117 |     extractor = yolov3.extractor
118 |     first_conv = extractor.conv
119 |     bn = first_conv.bn
120 |     conv = first_conv.conv
121 |     # first conv copy
122 |     ptr = copy_weights(bn, conv, ptr, weights)
123 | 
124 |     layers = [extractor.layer1, extractor.layer2, extractor.layer3, extractor.layer4, extractor.layer5]
125 |     for layer in layers:
126 |         for i in range(len(layer)):
127 |             if i == 0:
128 |                 bn = layer[i].bn
129 |                 conv = layer[i].conv
130 |                 ptr = copy_weights(bn, conv, ptr, weights)
131 |             else:
132 |                 bn = layer[i].conv1.bn
133 |                 conv = layer[i].conv1.conv
134 |                 ptr = copy_weights(bn, conv, ptr, weights)
135 |                 bn = layer[i].conv2.bn
136 |                 conv = layer[i].conv2.conv
137 |                 ptr = copy_weights(bn, conv, ptr, weights)
138 |     predict_conv_list1 = yolov3.predict_conv_list1
139 |     smooth_conv1 = yolov3.smooth_conv1
140 |     predict_conv_list2 = yolov3.predict_conv_list2
141 |     smooth_conv2 = yolov3.smooth_conv2
142 |     predict_conv_list3 = yolov3.predict_conv_list3
143 |     for i in range(len(predict_conv_list1)):
144 |         if i == 6:
145 |             bn = 0
146 |             conv = predict_conv_list1[i]
147 |             ptr = copy_weights(bn, conv, ptr, weights, use_bn=False)
148 |         else:
149 |             bn = predict_conv_list1[i].bn
150 |             conv = predict_conv_list1[i].conv
151 |             ptr = copy_weights(bn, conv, ptr, weights)
152 |     bn = smooth_conv1.bn
153 |     conv = smooth_conv1.conv
154 |     ptr = copy_weights(bn, conv, ptr, weights)
155 |     for i in range(len(predict_conv_list2)):
156 |         if i == 6:
157 |             bn = 0
158 |             conv = predict_conv_list2[i]
159 |             ptr = copy_weights(bn, conv, ptr, weights, use_bn=False)
160 |         else:
161 |             bn = predict_conv_list2[i].bn
162 |             conv = predict_conv_list2[i].conv
163 |             ptr = copy_weights(bn, conv, ptr, weights)
164 |     bn = smooth_conv2.bn
165 |     conv = smooth_conv2.conv
166 |     ptr = copy_weights(bn, conv, ptr, weights)
167 | 
168 |     for i in range(len(predict_conv_list3)):
169 |         if i == 6:
170 |             bn = 0
171 |             conv = predict_conv_list3[i]
172 |             ptr = copy_weights(bn, conv, ptr, weights, use_bn=False)
173 |         else:
174 |             bn = predict_conv_list3[i].bn
175 |             conv = predict_conv_list3[i].conv
176 |             ptr = copy_weights(bn, conv, ptr, weights)
177 |     print(ptr)
178 |     fp.close()               
179 | 
180 | 
181 | def arg_parse():
182 |     """
183 |     Parse arguments to the train module
184 |     """
185 |     parser = argparse.ArgumentParser(
186 |         description='Yolov3 pytorch Training')
187 |     parser.add_argument('--input_wh', default=(416, 416),
188 |                         help='input size.')
189 |     parser.add_argument('--version', '--version', default='coco',
190 |                         help='voc, coco, darknet53')
191 |     parser.add_argument('--weights', default='./weights/yolov3.weights', help='pretrained base model')
192 |     parser.add_argument('--save_name', default='./weights/convert_yolov3_coco.pth', help='save name')
193 | 
194 |     return parser.parse_args()
195 | 
196 | def load_weights_darknet19(weightfile, darknet19):
197 |     fp = open(weightfile, "rb")
198 |     #The first 4 values are header information 
199 |     # 1. Major version number
200 |     # 2. Minor Version Number
201 |     # 3. Subversion number 
202 |     # 4. IMages seen 
203 |     header = np.fromfile(fp, dtype = np.int32, count=4)
204 |     weights = np.fromfile(fp, dtype = np.float32)
205 |     ptr = 0
206 |     first_conv = darknet19.conv
207 |     bn = first_conv.bn
208 |     conv = first_conv.conv
209 |     # first conv copy
210 |     ptr = copy_weights(bn, conv, ptr, weights)
211 |     layers = [darknet19.layer1, darknet19.layer2, darknet19.layer3, darknet19.layer4, darknet19.layer5]
212 |     for layer in layers:
213 |         for i in range(len(layer)):
214 |             if i == 0:
215 |                 pass
216 |             else:
217 |                 bn = layer[i].bn
218 |                 conv = layer[i].conv
219 |                 ptr = copy_weights(bn, conv, ptr, weights)
220 |     fp.close()
221 | 
222 | if __name__ == '__main__':
223 |     args = arg_parse()
224 |     weightfile = args.weights
225 |     input_wh = args.input_wh
226 |     version = args.version
227 |     save_name = args.save_name
228 |     if version == "voc":
229 |         cfg = voc_config
230 |         yolov3 = Yolov3("train", input_wh, cfg["anchors"], cfg["anchors_mask"], cfg["num_classes"])
231 |     elif version == "coco":
232 |         cfg = coco_config
233 |         yolov3 = Yolov3("train", input_wh, cfg["anchors"], cfg["anchors_mask"], cfg["num_classes"])
234 |     elif version == "darknet53":
235 |         cfg = voc_config
236 |         num_blocks = [1,2,8,8,4]
237 |         yolov3 = Darknet53(num_blocks)
238 |     else:
239 |         print("Unkown version !!!")
240 |         import sys
241 |         sys.exit()
242 | 
243 |     load_weights(weightfile, yolov3, version)
244 |     # name = "convert_yolo_" + version + ".pth"
245 |     # save_path = os.path.join("./weights", name)
246 |     torch.save(yolov3.state_dict(), save_name)
247 | 
248 | 
249 | 
250 | 


--------------------------------------------------------------------------------
/data/coco.py:
--------------------------------------------------------------------------------
  1 | """VOC Dataset Classes
  2 | 
  3 | Original author: Francisco Massa
  4 | https://github.com/fmassa/vision/blob/voc_dataset/torchvision/datasets/voc.py
  5 | 
  6 | Updated by: Ellis Brown, Max deGroot
  7 | """
  8 | 
  9 | import os
 10 | import pickle
 11 | import os.path
 12 | import sys
 13 | import torch
 14 | import torch.utils.data as data
 15 | import torchvision.transforms as transforms
 16 | import cv2
 17 | import numpy as np
 18 | import json
 19 | import uuid
 20 | from .data_augment import preproc
 21 | 
 22 | from pycocotools.coco import COCO
 23 | from pycocotools.cocoeval import COCOeval
 24 | from pycocotools import mask as COCOmask
 25 | 
 26 | class COCOAnnotationTransform(object):
 27 |     """Transforms a COCO annotation into a Tensor of bbox coords and label index
 28 |     Initilized with a dictionary lookup of classnames to indexes
 29 | 
 30 |     Arguments:
 31 |         class_to_ind (dict, optional): dictionary lookup of classnames -> indexes
 32 |             (default: alphabetic indexing of VOC's 20 classes)
 33 |         keep_difficult (bool, optional): keep difficult instances or not
 34 |             (default: False)
 35 |         height (int): height
 36 |         width (int): width 
 37 |     """
 38 | 
 39 |     def __init__(self):
 40 |         pass
 41 | 
 42 |     def __call__(self, target, width, height):
 43 |         """
 44 |         Arguments:
 45 |             target (annotation) : the target annotation to be made usable
 46 |                 will be not normlized
 47 |         Returns:
 48 |             a list containing lists of bounding boxes  [bbox coords, class name]
 49 |         """
 50 |             
 51 |         boxes = target[:,:-1].copy()
 52 |         labels = target[:,-1].copy()
 53 |         boxes[:, 0::2] /= width
 54 |         boxes[:, 1::2] /= height
 55 |         b_w = (boxes[:, 2] - boxes[:, 0])*1.
 56 |         b_h = (boxes[:, 3] - boxes[:, 1])*1.
 57 |         mask_b= np.minimum(b_w, b_h) > 0.01
 58 |         boxes_t = boxes[mask_b]
 59 |         labels_t = labels[mask_b].copy()
 60 | 
 61 |         return boxes_t, labels_t
 62 | 
 63 | 
 64 | class COCODetection(data.Dataset):
 65 | 
 66 |     """VOC Detection Dataset Object
 67 | 
 68 |     input is image, target is annotation
 69 | 
 70 |     Arguments:
 71 |         root (string): filepath to VOCdevkit folder.
 72 |         image_set (string): imageset to use (eg. 'train', 'val', 'test')
 73 |         transform (callable, optional): transformation to perform on the
 74 |             input image
 75 |         target_transform (callable, optional): transformation to perform on the
 76 |             target `annotation`
 77 |             (eg: take in caption string, return tensor of word indices)
 78 |         dataset_name (string, optional): which dataset to load
 79 |             (default: 'VOC2007')
 80 |     """
 81 | 
 82 |     def __init__(self, root, image_sets, resize_wh, batch_size, multiscale=False, dataset_name='COCO'):
 83 |         self.root = root
 84 |         self.cache_path = os.path.join(self.root, 'cache')
 85 |         self.image_set = image_sets
 86 |         self.name = dataset_name
 87 |         self.resize_wh = resize_wh
 88 |         self.batch_size = batch_size
 89 |         self.multiscale = multiscale
 90 |         self.transform = preproc()
 91 |         self.ids = list()
 92 |         self.annotations = list()
 93 |         self._view_map = {
 94 |             'minival2014' : 'val2014',          # 5k val2014 subset
 95 |             'valminusminival2014' : 'val2014',  # val2014 \setminus minival2014
 96 |             'test-dev2015' : 'test2015',
 97 |         }
 98 | 
 99 |         for (year, image_set) in image_sets:
100 |             coco_name = image_set+year
101 |             data_name = (self._view_map[coco_name]
102 |                         if coco_name in self._view_map
103 |                         else coco_name)
104 |             annofile = self._get_ann_file(coco_name)
105 |             _COCO = COCO(annofile)
106 |             self._COCO = _COCO
107 |             self.coco_name = coco_name
108 |             cats = _COCO.loadCats(_COCO.getCatIds())
109 |             self._classes = tuple([c['name'] for c in cats])
110 |             self.num_classes = len(self._classes)
111 |             self._class_to_ind = dict(zip(self._classes, range(self.num_classes)))
112 |             self._class_to_coco_cat_id = dict(zip([c['name'] for c in cats],
113 |                                                   _COCO.getCatIds()))
114 |             indexes = _COCO.getImgIds()
115 |             self.image_indexes = indexes
116 |             self.ids.extend([self.image_path_from_index(data_name, index) for index in indexes ])
117 |             if image_set.find('test') != -1:
118 |                 print('test set will not load annotations!')
119 |             else:
120 |                 self.annotations.extend(self._load_coco_annotations(coco_name, indexes,_COCO))
121 | 
122 | 
123 | 
124 |     def image_path_from_index(self, name, index):
125 |         """
126 |         Construct an image path from the image's "index" identifier.
127 |         """
128 |         # Example image path for index=119993:
129 |         #   images/train2014/COCO_train2014_000000119993.jpg
130 |         file_name = ('COCO_' + name + '_' +
131 |                      str(index).zfill(12) + '.jpg')
132 |         image_path = os.path.join(self.root, 'images',
133 |                               name, file_name)
134 |         assert os.path.exists(image_path), \
135 |                 'Path does not exist: {}'.format(image_path)
136 |         return image_path
137 | 
138 | 
139 |     def _get_ann_file(self, name):
140 |         prefix = 'instances' if name.find('test') == -1 \
141 |                 else 'image_info'
142 |         return os.path.join(self.root, 'annotations',
143 |                         prefix + '_' + name + '.json')
144 | 
145 | 
146 |     def _load_coco_annotations(self, coco_name, indexes, _COCO):
147 |         cache_file=os.path.join(self.cache_path,coco_name+'_gt_roidb.pkl')
148 |         if os.path.exists(cache_file):
149 |             with open(cache_file, 'rb') as fid:
150 |                 roidb = pickle.load(fid)
151 |             print('{} gt roidb loaded from {}'.format(coco_name,cache_file))
152 |             return roidb
153 | 
154 |         gt_roidb = [self._annotation_from_index(index, _COCO)
155 |                     for index in indexes]
156 |         with open(cache_file, 'wb') as fid:
157 |             pickle.dump(gt_roidb,fid,pickle.HIGHEST_PROTOCOL)
158 |         print('wrote gt roidb to {}'.format(cache_file))
159 |         return gt_roidb
160 | 
161 | 
162 |     def _annotation_from_index(self, index, _COCO):
163 |         """
164 |         Loads COCO bounding-box instance annotations. Crowd instances are
165 |         handled by marking their overlaps (with all categories) to -1. This
166 |         overlap value means that crowd "instances" are excluded from training.
167 |         """
168 |         im_ann = _COCO.loadImgs(index)[0]
169 |         width = im_ann['width']
170 |         height = im_ann['height']
171 | 
172 |         annIds = _COCO.getAnnIds(imgIds=index, iscrowd=None)
173 |         objs = _COCO.loadAnns(annIds)
174 |         # Sanitize bboxes -- some are invalid
175 |         valid_objs = []
176 |         for obj in objs:
177 |             x1 = np.max((0, obj['bbox'][0]))
178 |             y1 = np.max((0, obj['bbox'][1]))
179 |             x2 = np.min((width - 1, x1 + np.max((0, obj['bbox'][2] - 1))))
180 |             y2 = np.min((height - 1, y1 + np.max((0, obj['bbox'][3] - 1))))
181 |             if obj['area'] > 0 and x2 >= x1 and y2 >= y1:
182 |                 obj['clean_bbox'] = [x1, y1, x2, y2]
183 |                 valid_objs.append(obj)
184 |         objs = valid_objs
185 |         num_objs = len(objs)
186 | 
187 |         res = np.zeros((num_objs, 5))
188 | 
189 |         # Lookup table to map from COCO category ids to our internal class
190 |         # indices
191 |         coco_cat_id_to_class_ind = dict([(self._class_to_coco_cat_id[cls],
192 |                                           self._class_to_ind[cls])
193 |                                          for cls in self._classes])
194 | 
195 |         for ix, obj in enumerate(objs):
196 |             cls = coco_cat_id_to_class_ind[obj['category_id']]
197 |             res[ix, 0:4] = obj['clean_bbox']
198 |             res[ix, 4] = cls
199 | 
200 |         return res
201 | 
202 | 
203 | 
204 |     def __getitem__(self, index):
205 |         img_id = self.ids[index]
206 |         target = self.annotations[index]
207 |         if self.multiscale:
208 |             if index % (self.batch_size * 20) == 0:
209 |                 rnd = (random.randint(0,9) + 10) * 32
210 |                 print("resize scale", index, rnd)
211 |                 self.resize_wh = (rnd, rnd)
212 |         img = cv2.imread(img_id, cv2.IMREAD_COLOR)
213 |         height, width, _ = img.shape
214 | 
215 |         if self.transform is not None:
216 |             img, target = self.transform(img, target, self.resize_wh)
217 | 
218 |         return img, target
219 | 
220 |     def __len__(self):
221 |         return len(self.ids)
222 | 
223 |     def pull_image(self, index):
224 |         '''Returns the original image object at index in PIL form
225 | 
226 |         Note: not using self.__getitem__(), as any transformations passed in
227 |         could mess up this functionality.
228 | 
229 |         Argument:
230 |             index (int): index of img to show
231 |         Return:
232 |             PIL img
233 |         '''
234 |         img_id = self.ids[index]
235 |         return cv2.imread(img_id, cv2.IMREAD_COLOR), img_id
236 | 
237 | 
238 |     def pull_tensor(self, index):
239 |         '''Returns the original image at an index in tensor form
240 | 
241 |         Note: not using self.__getitem__(), as any transformations passed in
242 |         could mess up this functionality.
243 | 
244 |         Argument:
245 |             index (int): index of img to show
246 |         Return:
247 |             tensorized version of img, squeezed
248 |         '''
249 |         to_tensor = transforms.ToTensor()
250 |         return torch.Tensor(self.pull_image(index)).unsqueeze_(0)
251 | 
252 |     def _print_detection_eval_metrics(self, coco_eval):
253 |         IoU_lo_thresh = 0.5
254 |         IoU_hi_thresh = 0.95
255 |         def _get_thr_ind(coco_eval, thr):
256 |             ind = np.where((coco_eval.params.iouThrs > thr - 1e-5) &
257 |                            (coco_eval.params.iouThrs < thr + 1e-5))[0][0]
258 |             iou_thr = coco_eval.params.iouThrs[ind]
259 |             assert np.isclose(iou_thr, thr)
260 |             return ind
261 | 
262 |         ind_lo = _get_thr_ind(coco_eval, IoU_lo_thresh)
263 |         ind_hi = _get_thr_ind(coco_eval, IoU_hi_thresh)
264 |         # precision has dims (iou, recall, cls, area range, max dets)
265 |         # area range index 0: all area ranges
266 |         # max dets index 2: 100 per image
267 |         precision = \
268 |             coco_eval.eval['precision'][ind_lo:(ind_hi + 1), :, :, 0, 2]
269 |         ap_default = np.mean(precision[precision > -1])
270 |         print('~~~~ Mean and per-category AP @ IoU=[{:.2f},{:.2f}] '
271 |                '~~~~'.format(IoU_lo_thresh, IoU_hi_thresh))
272 |         print('{:.1f}'.format(100 * ap_default))
273 |         for cls_ind, cls in enumerate(self._classes):
274 |             if cls == '__background__':
275 |                 continue
276 |             # minus 1 because of __background__
277 |             precision = coco_eval.eval['precision'][ind_lo:(ind_hi + 1), :, cls_ind - 1, 0, 2]
278 |             ap = np.mean(precision[precision > -1])
279 |             print('{:.1f}'.format(100 * ap))
280 | 
281 |         print('~~~~ Summary metrics ~~~~')
282 |         coco_eval.summarize()
283 | 
284 |     def _do_detection_eval(self, res_file, output_dir):
285 |         ann_type = 'bbox'
286 |         coco_dt = self._COCO.loadRes(res_file)
287 |         coco_eval = COCOeval(self._COCO, coco_dt)
288 |         coco_eval.params.useSegm = (ann_type == 'segm')
289 |         coco_eval.evaluate()
290 |         coco_eval.accumulate()
291 |         self._print_detection_eval_metrics(coco_eval)
292 |         eval_file = os.path.join(output_dir, 'detection_results.pkl')
293 |         with open(eval_file, 'wb') as fid:
294 |             pickle.dump(coco_eval, fid, pickle.HIGHEST_PROTOCOL)
295 |         print('Wrote COCO eval results to: {}'.format(eval_file))
296 | 
297 |     def _coco_results_one_category(self, boxes, cat_id):
298 |         results = []
299 |         for im_ind, index in enumerate(self.image_indexes):
300 |             # print(type(boxes[im_ind]))
301 |             # print(boxes[im_ind])
302 |             # dets = boxes[im_ind].astype(np.float)
303 |             dets = boxes[im_ind]
304 |             if dets == []:
305 |                 continue
306 |             dets = boxes[im_ind].astype(np.float)
307 |             scores = dets[:, -1]
308 |             xs = dets[:, 0]
309 |             ys = dets[:, 1]
310 |             ws = dets[:, 2] - xs + 1
311 |             hs = dets[:, 3] - ys + 1
312 |             results.extend(
313 |               [{'image_id' : index,
314 |                 'category_id' : cat_id,
315 |                 'bbox' : [xs[k], ys[k], ws[k], hs[k]],
316 |                 'score' : scores[k]} for k in range(dets.shape[0])])
317 |         return results
318 | 
319 |     def _write_coco_results_file(self, all_boxes, res_file):
320 |         # [{"image_id": 42,
321 |         #   "category_id": 18,
322 |         #   "bbox": [258.15,41.29,348.26,243.78],
323 |         #   "score": 0.236}, ...]
324 |         results = []
325 |         for cls_ind, cls in enumerate(self._classes):
326 |             if cls == '__background__':
327 |                 continue
328 |             print('Collecting {} results ({:d}/{:d})'.format(cls, cls_ind,
329 |                                                           self.num_classes ))
330 |             coco_cat_id = self._class_to_coco_cat_id[cls]
331 |             results.extend(self._coco_results_one_category(all_boxes[cls_ind],
332 |                                                            coco_cat_id))
333 |             '''
334 |             if cls_ind ==30:
335 |                 res_f = res_file+ '_1.json'
336 |                 print('Writing results json to {}'.format(res_f))
337 |                 with open(res_f, 'w') as fid:
338 |                     json.dump(results, fid)
339 |                 results = []
340 |             '''
341 |         #res_f2 = res_file+'_2.json'
342 |         print('Writing results json to {}'.format(res_file))
343 |         with open(res_file, 'w') as fid:
344 |             json.dump(results, fid)
345 | 
346 |     def evaluate_detections(self, all_boxes, output_dir):
347 |         res_file = os.path.join(output_dir, ('detections_' +
348 |                                          self.coco_name +
349 |                                          '_results'))
350 |         res_file += '.json'
351 |         self._write_coco_results_file(all_boxes, res_file)
352 |         # Only do evaluation on non-test sets
353 |         if self.coco_name.find('test') == -1:
354 |             self._do_detection_eval(res_file, output_dir)
355 |         # Optionally cleanup results json file
356 | 
357 | 


--------------------------------------------------------------------------------
/data/config.py:
--------------------------------------------------------------------------------
 1 | # config.py
 2 | import os
 3 | import os.path
 4 | 
 5 | pwd = os.getcwd()
 6 | VOCroot = os.path.join(pwd, "data/datasets/VOCdevkit0712/")
 7 | COCOroot = os.path.join(pwd, "data/datasets/coco2015")
 8 | 
 9 | datasets_dict = {"VOC": [('0712', '0712_trainval')],
10 |             "VOC0712++": [('0712', '0712_trainval_test')],
11 |             "VOC2012" : [('2012', '2012_trainval')],
12 |             "COCO": [('2014', 'train'), ('2014', 'valminusminival')],
13 |             "VOC2007": [('0712', "2007_test")],
14 |             "COCOval": [('2014', 'minival')]}
15 | 
16 | 
17 | voc_config = {
18 |     'anchors' : [[116, 90], [156, 198], [373, 326], 
19 |                 [30, 61], [62, 45], [59, 119], 
20 |                 [10, 13], [16, 30], [33, 23]],
21 |     'root': VOCroot,
22 |     'num_classes': 20,
23 |     'multiscale' : True,
24 |     'name_path' : "./model/voc.names",
25 |     'anchors_mask' : [[0,1,2], [3,4,5], [6,7,8]]
26 | }
27 | 
28 | coco_config = {
29 |     'anchors' : [[116, 90], [156, 198], [373, 326], 
30 |                 [30, 61], [62, 45], [59, 119], 
31 |                 [10, 13], [16, 30], [33, 23]],
32 |     'root': COCOroot,
33 |     'num_classes': 80,
34 |     'multiscale' : True,
35 |     'name_path' : "./model/coco.names",
36 |     'anchors_mask' : [[0,1,2], [3,4,5], [6,7,8]]
37 | }
38 | 
39 |     # anchors = [[214,327], [326,193], [359,359],
40 |     #         [116,286], [122,97], [171,180],
41 |     #         [24,34], [46,84], [68,185]]


--------------------------------------------------------------------------------
/data/data_augment.py:
--------------------------------------------------------------------------------
  1 | """Data augmentation functionality. Passed as callable transformations to
  2 | Dataset classes.
  3 | 
  4 | The data augmentation procedures were interpreted from @weiliu89's SSD paper
  5 | http://arxiv.org/abs/1512.02325
  6 | 
  7 | TODO: implement data_augment for training
  8 | 
  9 | Ellis Brown, Max deGroot
 10 | """
 11 | 
 12 | import torch
 13 | from torchvision import transforms
 14 | import cv2
 15 | import numpy as np
 16 | import random
 17 | import math
 18 | 
 19 | 
 20 | def matrix_iou(a,b):
 21 |     """
 22 |     return iou of a and b, numpy version for data augenmentation
 23 |     """
 24 |     lt = np.maximum(a[:, np.newaxis, :2], b[:, :2])
 25 |     rb = np.minimum(a[:, np.newaxis, 2:], b[:, 2:])
 26 | 
 27 |     area_i = np.prod(rb - lt, axis=2) * (lt < rb).all(axis=2)
 28 |     area_a = np.prod(a[:, 2:] - a[:, :2], axis=1)
 29 |     area_b = np.prod(b[:, 2:] - b[:, :2], axis=1)
 30 |     return area_i / (area_a[:, np.newaxis] + area_b - area_i)
 31 | 
 32 | def _crop(image, boxes, labels):
 33 |     height, width, _ = image.shape
 34 | 
 35 |     if len(boxes)== 0:
 36 |         return image, boxes, labels
 37 | 
 38 |     while True:
 39 |         mode = random.choice((
 40 |             None,
 41 |             (0.3, None),
 42 |             (0.5, None),
 43 |             (0.7, None),
 44 |             (0.9, None),
 45 |             # (None, None),
 46 |         ))
 47 | 
 48 |         if mode is None:
 49 |             return image, boxes, labels
 50 | 
 51 |         min_iou, max_iou = mode
 52 |         if min_iou is None:
 53 |             min_iou = float('-inf')
 54 |         if max_iou is None:
 55 |             max_iou = float('inf')
 56 | 
 57 |         for _ in range(50):
 58 |             scale = random.uniform(0.3,1.)
 59 |             min_ratio = max(0.5, scale*scale)
 60 |             max_ratio = min(2, 1. / scale / scale)
 61 |             ratio = math.sqrt(random.uniform(min_ratio, max_ratio))
 62 |             w = int(scale * ratio * width)
 63 |             h = int((scale / ratio) * height)
 64 | 
 65 | 
 66 |             l = random.randrange(width - w)
 67 |             t = random.randrange(height - h)
 68 |             roi = np.array((l, t, l + w, t + h))
 69 | 
 70 |             iou = matrix_iou(boxes, roi[np.newaxis])
 71 |             
 72 |             if not (min_iou <= iou.min() and iou.max() <= max_iou):
 73 |                 continue
 74 | 
 75 |             image_t = image[roi[1]:roi[3], roi[0]:roi[2]]
 76 | 
 77 |             centers = (boxes[:, :2] + boxes[:, 2:]) / 2
 78 |             mask = np.logical_and(roi[:2] < centers, centers < roi[2:]) \
 79 |                      .all(axis=1)
 80 |             boxes_t = boxes[mask].copy()
 81 |             labels_t = labels[mask].copy()
 82 |             if len(boxes_t) == 0:
 83 |                 continue
 84 | 
 85 |             boxes_t[:, :2] = np.maximum(boxes_t[:, :2], roi[:2])
 86 |             boxes_t[:, :2] -= roi[:2]
 87 |             boxes_t[:, 2:] = np.minimum(boxes_t[:, 2:], roi[2:])
 88 |             boxes_t[:, 2:] -= roi[:2]
 89 | 
 90 |             return image_t, boxes_t,labels_t
 91 | 
 92 | 
 93 | def _distort(image):
 94 |     def _convert(image, alpha=1, beta=0):
 95 |         tmp = image.astype(float) * alpha + beta
 96 |         tmp[tmp < 0] = 0
 97 |         tmp[tmp > 255] = 255
 98 |         image[:] = tmp
 99 | 
100 |     image = image.copy()
101 | 
102 |     if random.randrange(2):
103 |         _convert(image, beta=random.uniform(-32, 32))
104 | 
105 |     if random.randrange(2):
106 |         _convert(image, alpha=random.uniform(0.5, 1.5))
107 | 
108 |     image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
109 | 
110 |     if random.randrange(2):
111 |         tmp = image[:, :, 0].astype(int) + random.randint(-18, 18)
112 |         tmp %= 180
113 |         image[:, :, 0] = tmp
114 | 
115 |     if random.randrange(2):
116 |         _convert(image[:, :, 1], alpha=random.uniform(0.5, 1.5))
117 | 
118 |     image = cv2.cvtColor(image, cv2.COLOR_HSV2BGR)
119 | 
120 |     return image
121 | 
122 | 
123 | def _expand(image, boxes,fill, p):
124 |     if random.random() > p:
125 |         return image, boxes
126 | 
127 |     height, width, depth = image.shape
128 |     for _ in range(50):
129 |         scale = random.uniform(1,4)
130 | 
131 |         min_ratio = max(0.5, 1./scale/scale)
132 |         max_ratio = min(2, scale*scale)
133 |         ratio = math.sqrt(random.uniform(min_ratio, max_ratio))
134 |         ws = scale * ratio
135 |         hs = scale / ratio
136 |         if ws < 1 or hs < 1:
137 |             continue
138 |         w = int(ws * width)
139 |         h = int(hs * height)
140 | 
141 |         left = random.randint(0, w - width)
142 |         top = random.randint(0, h - height)
143 | 
144 |         boxes_t = boxes.copy()
145 |         boxes_t[:, :2] += (left, top)
146 |         boxes_t[:, 2:] += (left, top)
147 | 
148 | 
149 |         expand_image = np.empty(
150 |             (h, w, depth),
151 |             dtype=image.dtype)
152 |         expand_image[:, :] = fill
153 |         expand_image[top:top + height, left:left + width] = image
154 |         image = expand_image
155 | 
156 |         return image, boxes_t
157 | 
158 | 
159 | def _mirror(image, boxes):
160 |     _, width, _ = image.shape
161 |     if random.randrange(2):
162 |         image = image[:, ::-1]
163 |         boxes = boxes.copy()
164 |         boxes[:, 0::2] = width - boxes[:, 2::-2]
165 |     return image, boxes
166 | 
167 | def rand(a=0, b=1):
168 |     return np.random.rand()*(b-a) + a
169 | 
170 | # def random_letterbox_image(img, resize_wh, boxes, jitter=0.3):
171 | #     '''resize image with unchanged aspect ratio using padding'''
172 | #     img_w, img_h = img.shape[1], img.shape[0]
173 | #     w, h = resize_wh
174 | #     new_ar = w / h * rand(1-jitter, 1+jitter)/rand(1-jitter, 1+jitter)
175 | #     scale = rand(.25, 2)
176 | #     if new_ar < 1:
177 | #         nh = int(scale * h)
178 | #         nw = int(nh * new_ar)
179 | #     else:
180 | #         nw = int(scale * w)
181 | #         nh = int(nw / new_ar)
182 | #     resized_image = cv2.resize(img, (nw, nh), interpolation = cv2.INTER_CUBIC)
183 | 
184 | #     dx = int(rand(0, w - nw))
185 | #     dy = int(rand(0, h - nh))
186 | 
187 | #     if (w - nw) < 0:
188 | #         cxmin = 0
189 | #         xmin = nw - w + dx
190 | #         xmax = nw + dx
191 | #         cxmax = xmax - xmin
192 | #     else:
193 | #         cxmin = dx
194 | #         xmin = 0
195 | #         xmax = nw
196 | #         cxmax = nw + dx
197 | #     if (h - nh) < 0:
198 | #         cymin = 0
199 | #         ymin = nh - h + dy
200 | #         ymax = nh + dy
201 | #         cymax = ymax - ymin
202 | #     else:
203 | #         cymin = dy
204 | #         ymin = 0
205 | #         ymax = nh
206 | #         cymax = nh + dy
207 | 
208 | #     resized_image = resized_image[ymin:ymax,xmin:xmax,:]
209 | 
210 | #     boxes[:, 0::2] = (boxes[:, 0::2] * nw / img_w  + dx) / w
211 | #     boxes[:, 1::2] = (boxes[:, 1::2] * nh / img_h + dy ) / h
212 | #     # clamp boxes
213 | #     boxes[:, 0:2][boxes[:, 0:2]<=0] = 0
214 | #     boxes[:, 2:][boxes[:, 2:]>=1] = 0.9999
215 | 
216 | #     canvas = np.full((resize_wh[1], resize_wh[0], 3), 128)
217 | #     canvas[cymin:cymax, cxmin:cxmax,  :] = resized_image
218 | 
219 | #     img_ = canvas[:,:,::-1].transpose((2,0,1)).copy()
220 | #     img_ = torch.from_numpy(img_).float().div(255.0)
221 | #     return img_, boxes
222 | 
223 | def letterbox_image(img, resize_wh, boxes):
224 |     '''resize image with unchanged aspect ratio using padding'''
225 |     img_w, img_h = img.shape[1], img.shape[0]
226 |     w, h = resize_wh
227 |     new_w = int(img_w * min(w/img_w, h/img_h))
228 |     new_h = int(img_h * min(w/img_w, h/img_h))
229 | 
230 |     if len(boxes) > 0:
231 |         boxes = boxes.copy()
232 |         dim_diff = np.abs(img_w - img_h)
233 |         max_size = max(img_w, img_h)
234 |         if img_w > img_h:
235 |             boxes[:, 1::2] += dim_diff // 2
236 |         else:
237 |             boxes[:, 0::2] += dim_diff // 2
238 |         boxes[:, 0::2] /= max_size
239 |         boxes[:, 1::2] /= max_size
240 |     resized_image = cv2.resize(img, (new_w, new_h), interpolation = cv2.INTER_CUBIC)
241 |     canvas = np.full((resize_wh[0], resize_wh[1], 3), 128)
242 |     canvas[(h-new_h)//2:(h-new_h)//2 + new_h,(w-new_w)//2:(w-new_w)//2 + new_w,  :] = resized_image
243 |     img_ = canvas[:,:,::-1].transpose((2,0,1)).copy()
244 |     img_ = torch.from_numpy(img_).float().div(255.0)
245 |     
246 |     return img_, boxes
247 | 
248 | 
249 | class preproc(object):
250 | 
251 |     def __init__(self):
252 |         self.means = [128, 128, 128]
253 |         self.p = 0.5
254 | 
255 |     def __call__(self, image, targets, resize_wh, use_pad=True):
256 |         boxes = targets[:, :-1].copy()
257 |         labels = targets[:, -1].copy()
258 |         height, width, _ = image.shape
259 |         if len(boxes) == 0:
260 |             targets = np.zeros((1,5))
261 |             image, _ = letterbox_image(image, resize_wh, boxes)
262 |             return image, targets
263 |         image_o = image.copy()
264 |         targets_o = targets.copy()
265 |         image_t, boxes, labels = _crop(image, boxes, labels)
266 |         image_t = _distort(image_t)
267 |         image_t, boxes = _expand(image_t, boxes, self.means, self.p)
268 |         image_t, boxes = _mirror(image_t, boxes)
269 |         image_t, boxes = letterbox_image(image_t, resize_wh, boxes)
270 | 
271 |         boxes = boxes.copy()
272 |         b_w = (boxes[:, 2] - boxes[:, 0])*1.
273 |         b_h = (boxes[:, 3] - boxes[:, 1])*1.
274 |         mask_b= np.minimum(b_w, b_h) > 0.01
275 |         boxes_t = boxes[mask_b]
276 |         labels_t = labels[mask_b].copy()
277 | 
278 |         if len(boxes_t) == 0:
279 |             boxes_t = targets_o[:, :4].copy()
280 |             labels_t = targets_o[:, -1].copy()
281 |             image_t, boxes_t = letterbox_image(image_o, resize_wh, boxes_t)
282 | 
283 |         boxes_t[:, 0::2] *= resize_wh[0]
284 |         boxes_t[:, 1::2] *= resize_wh[1]
285 | 
286 |         labels_t = np.expand_dims(labels_t, 1)
287 |         targets_t = np.hstack((boxes_t, labels_t))
288 | 
289 |         return image_t, targets_t
290 | 
291 | 
292 | 


--------------------------------------------------------------------------------
/data/voc0712.py:
--------------------------------------------------------------------------------
  1 | """VOC Dataset Classes
  2 | 
  3 | Original author: Francisco Massa
  4 | https://github.com/fmassa/vision/blob/voc_dataset/torchvision/datasets/voc.py
  5 | 
  6 | Updated by: Ellis Brown, Max deGroot
  7 | """
  8 | 
  9 | import os
 10 | import os.path
 11 | import pickle
 12 | import sys
 13 | import torch
 14 | import torch.utils.data as data
 15 | from PIL import Image, ImageDraw, ImageFont
 16 | import cv2
 17 | import numpy as np
 18 | from .voc_eval import voc_eval
 19 | import random
 20 | from .data_augment import preproc
 21 | if sys.version_info[0] == 2:
 22 |     import xml.etree.cElementTree as ET
 23 | else:
 24 |     import xml.etree.ElementTree as ET
 25 | 
 26 | VOC_CLASSES = (
 27 |     'aeroplane', 'bicycle', 'bird', 'boat',
 28 |     'bottle', 'bus', 'car', 'cat', 'chair',
 29 |     'cow', 'diningtable', 'dog', 'horse',
 30 |     'motorbike', 'person', 'pottedplant',
 31 |     'sheep', 'sofa', 'train', 'tvmonitor')
 32 | 
 33 | # for making bounding boxes pretty
 34 | COLORS = ((255, 0, 0, 128), (0, 255, 0, 128), (0, 0, 255, 128),
 35 |           (0, 255, 255, 128), (255, 0, 255, 128), (255, 255, 0, 128))
 36 | 
 37 | 
 38 | class VOCAnnotationTransform(object):
 39 |     """Transforms a VOC annotation into a Tensor of bbox coords and label index
 40 |     Initilized with a dictionary lookup of classnames to indexes
 41 | 
 42 |     Arguments:
 43 |         class_to_ind (dict, optional): dictionary lookup of classnames -> indexes
 44 |             (default: alphabetic indexing of VOC's 20 classes)
 45 |         keep_difficult (bool, optional): keep difficult instances or not
 46 |             (default: False)
 47 |         height (int): height
 48 |         width (int): width 
 49 |     """
 50 | 
 51 |     def __init__(self, class_to_ind=None, keep_difficult=False):
 52 |         self.class_to_ind = class_to_ind or dict(
 53 |             zip(VOC_CLASSES, range(len(VOC_CLASSES))))
 54 |         self.keep_difficult = keep_difficult
 55 | 
 56 |     def __call__(self, target, width, height):
 57 |         """
 58 |         Arguments:
 59 |             target (annotation) : the target annotation to be made usable
 60 |                 will be an ET.Element
 61 |         Returns:
 62 |             a list containing lists of bounding boxes  [bbox coords, class name]
 63 |         """
 64 |         res = np.empty((0,5))
 65 |         for obj in target.iter('object'):
 66 |             difficult = int(obj.find('difficult').text) == 1
 67 |             if not self.keep_difficult and difficult:
 68 |                 continue
 69 |             name = obj.find('name').text.lower().strip()
 70 |             bbox = obj.find('bndbox')
 71 | 
 72 |             pts = ['xmin', 'ymin', 'xmax', 'ymax']
 73 |             bndbox = []
 74 |             for i, pt in enumerate(pts):
 75 |                 cur_pt = int(bbox.find(pt).text)
 76 |                 bndbox.append(cur_pt)
 77 |             label_idx = self.class_to_ind[name]
 78 |             bndbox.append(label_idx)
 79 |             res = np.vstack((res, bndbox))
 80 |         return res
 81 | 
 82 | 
 83 | class VOCDetection(data.Dataset):
 84 |     """VOC Detection Dataset Object
 85 | 
 86 |     input is image, target is annotation
 87 | 
 88 |     Arguments:
 89 |         root (string): filepath to VOCdevkit folder.
 90 |         image_set (string): imageset to use (eg. 'train', 'val', 'test')
 91 |         transform (callable, optional): transformation to perform on the
 92 |             input image
 93 |         target_transform (callable, optional): transformation to perform on the
 94 |             target `annotation`
 95 |             (eg: take in caption string, return tensor of word indices)
 96 |         dataset_name (string, optional): which dataset to load
 97 |             (default: 'VOC2007')
 98 |     """
 99 | 
100 |     def __init__(self, root, image_sets, resize_wh, batch_size=16, multiscale=False, dataset_name='VOC0712'):
101 |         self.root = root
102 |         self.image_set = image_sets
103 |         self.transform = preproc()
104 |         self.resize_wh = resize_wh
105 |         self.target_transform = VOCAnnotationTransform()
106 |         self.name = dataset_name
107 |         self.multiscale = multiscale
108 |         self.batch_size = batch_size
109 |         self._annopath = os.path.join('%s', 'Annotations', '%s.xml')
110 |         self._imgpath = os.path.join('%s', 'JPEGImages', '%s.jpg')
111 |         self.ids = list()
112 |         for (year, name) in image_sets:
113 |             self._year = year
114 |             rootpath = os.path.join(self.root, 'VOC' + year)
115 |             for line in open(os.path.join(rootpath, 'ImageSets', 'Main', name + '.txt')):
116 |                 self.ids.append((rootpath, line.strip()))
117 |  
118 |     def __getitem__(self, index):
119 |         img_id = self.ids[index]
120 |         # multiscale train
121 |         if self.multiscale:
122 |             if index % (self.batch_size * 20) == 0:
123 |                 rnd = (random.randint(0,9) + 10) * 32
124 |                 print("resize scale", index, rnd)
125 |                 self.resize_wh = (rnd, rnd)
126 |         target = ET.parse(self._annopath % img_id).getroot()
127 |         img = cv2.imread(self._imgpath % img_id)
128 |         height, width, channels = img.shape
129 |         if self.target_transform is not None:
130 |             target = self.target_transform(target, width, height)
131 | 
132 |         if self.transform is not None:
133 |             img, target = self.transform(img, target, self.resize_wh)
134 | 
135 | 
136 |         return img, target
137 | 
138 | 
139 |     def __len__(self):
140 |         return len(self.ids)
141 | 
142 |     def pull_image(self, index):
143 |         '''Returns the original image object at index in PIL form
144 | 
145 |         Note: not using self.__getitem__(), as any transformations passed in
146 |         could mess up this functionality.
147 | 
148 |         Argument:
149 |             index (int): index of img to show
150 |         Return:
151 |             PIL img
152 |         '''
153 |         img_id = self.ids[index]
154 |         return cv2.imread(self._imgpath % img_id, cv2.IMREAD_COLOR), img_id
155 | 
156 |     def pull_anno(self, index):
157 |         '''Returns the original annotation of image at index
158 | 
159 |         Note: not using self.__getitem__(), as any transformations passed in
160 |         could mess up this functionality.
161 | 
162 |         Argument:
163 |             index (int): index of img to get annotation of
164 |         Return:
165 |             list:  [img_id, [(label, bbox coords),...]]
166 |                 eg: ('001718', [('dog', (96, 13, 438, 332))])
167 |         '''
168 |         img_id = self.ids[index]
169 |         anno = ET.parse(self._annopath % img_id).getroot()
170 |         gt = self.target_transform(anno, 1, 1)
171 |         return img_id[1], gt
172 | 
173 |     def pull_tensor(self, index):
174 |         '''Returns the original image at an index in tensor form
175 | 
176 |         Note: not using self.__getitem__(), as any transformations passed in
177 |         could mess up this functionality.
178 | 
179 |         Argument:
180 |             index (int): index of img to show
181 |         Return:
182 |             tensorized version of img, squeezed
183 |         '''
184 |         return torch.Tensor(self.pull_image(index)).unsqueeze_(0)
185 | 
186 |     def evaluate_detections(self, all_boxes, output_dir=None):
187 |         """
188 |         all_boxes is a list of length number-of-classes.
189 |         Each list element is a list of length number-of-images.
190 |         Each of those list elements is either an empty list []
191 |         or a numpy array of detection.
192 | 
193 |         all_boxes[class][image] = [] or np.array of shape #dets x 5
194 |         """
195 |         self._write_voc_results_file(all_boxes)
196 |         self._do_python_eval(output_dir)
197 | 
198 |     def _get_voc_results_file_template(self):
199 |         filename = 'comp3_det_test' + '_{:s}.txt'
200 |         filedir = os.path.join(
201 |             self.root, 'results', 'VOC' + self._year, 'Main')
202 |         if not os.path.exists(filedir):
203 |             os.makedirs(filedir)
204 |         path = os.path.join(filedir, filename)
205 |         return path
206 | 
207 |     def _write_voc_results_file(self, all_boxes):
208 |         for cls_ind, cls in enumerate(VOC_CLASSES):
209 |             print('Writing {} VOC results file'.format(cls))
210 |             filename = self._get_voc_results_file_template().format(cls)
211 |             # print(filename)
212 |             with open(filename, 'wt') as f:
213 |                 for im_ind, index in enumerate(self.ids):
214 |                     index = index[1]
215 |                     dets = all_boxes[cls_ind][im_ind]
216 |                     if dets == []:
217 |                         continue
218 |                     for k in range(dets.shape[0]):
219 |                         f.write('{:s} {:.3f} {:.1f} {:.1f} {:.1f} {:.1f}\n'.
220 |                                 format(index, dets[k, -1],
221 |                                 dets[k, 0] + 1, dets[k, 1] + 1,
222 |                                 dets[k, 2] + 1, dets[k, 3] + 1))
223 | 
224 |     def _do_python_eval(self, output_dir='output'):
225 |         rootpath = os.path.join(self.root, 'VOC' + self._year)
226 |         name = self.image_set[0][1]
227 |         annopath = os.path.join(
228 |                                 rootpath,
229 |                                 'Annotations',
230 |                                 '{:s}.xml')
231 |         imagesetfile = os.path.join(
232 |                                 rootpath,
233 |                                 'ImageSets',
234 |                                 'Main',
235 |                                 name+'.txt')
236 |         cachedir = os.path.join(self.root, 'annotations_cache')
237 |         aps = []
238 |         # The PASCAL VOC metric changed in 2010
239 |         use_07_metric = True if int(self._year) < 2010 else False
240 |         # use_07_metric = True
241 |         print('VOC07 metric? ' + ('Yes' if use_07_metric else 'No'))
242 |         if output_dir is not None and not os.path.isdir(output_dir):
243 |             os.mkdir(output_dir)
244 |         for i, cls in enumerate(VOC_CLASSES):
245 | 
246 |             filename = self._get_voc_results_file_template().format(cls)
247 |             rec, prec, ap = voc_eval(
248 |                                     filename, annopath, imagesetfile, cls, cachedir, ovthresh=0.5,
249 |                                     use_07_metric=use_07_metric)
250 |             aps += [ap]
251 |             print('AP for {} = {:.4f}'.format(cls, ap))
252 |             if output_dir is not None:
253 |                 with open(os.path.join(output_dir, cls + '_pr.pkl'), 'wb') as f:
254 |                     pickle.dump({'rec': rec, 'prec': prec, 'ap': ap}, f)
255 |         print('Mean AP = {:.4f}'.format(np.mean(aps)))
256 |         print('~~~~~~~~')
257 |         print('Results:')
258 |         for ap in aps:
259 |             print('{:.3f}'.format(ap))
260 |         print('{:.3f}'.format(np.mean(aps)))
261 |         print('~~~~~~~~')
262 |         print('')
263 |         print('--------------------------------------------------------------')
264 |         print('Results computed with the **unofficial** Python eval code.')
265 |         print('Results should be very close to the official MATLAB eval code.')
266 |         print('Recompute with `./tools/reval.py --matlab ...` for your paper.')
267 |         print('-- Thanks, The Management')
268 |         print('--------------------------------------------------------------')
269 | 
270 | 
271 | def detection_collate(batch):
272 |     """Custom collate fn for dealing with batches of images that have a different
273 |     number of associated object annotations (bounding boxes).
274 | 
275 |     Arguments:
276 |         batch: (tuple) A tuple of tensor images and lists of annotations
277 | 
278 |     Return:
279 |         A tuple containing:
280 |             1) (tensor) batch of images stacked on their 0 dim
281 |             2) (list of tensors) annotations for a given image are stacked on 0 dim
282 |     """
283 |     targets = []
284 |     imgs = []
285 |     for sample in batch:
286 |         imgs.append(sample[0])
287 |         targets.append(torch.FloatTensor(sample[1]))
288 |     return torch.stack(imgs, 0), targets
289 | 
290 | 


--------------------------------------------------------------------------------
/data/voc_eval.py:
--------------------------------------------------------------------------------
  1 | # --------------------------------------------------------
  2 | # Fast/er R-CNN
  3 | # Licensed under The MIT License [see LICENSE for details]
  4 | # Written by Bharath Hariharan
  5 | # --------------------------------------------------------
  6 | 
  7 | import xml.etree.ElementTree as ET
  8 | import os
  9 | import pickle
 10 | import numpy as np
 11 | import pdb
 12 | 
 13 | 
 14 | def parse_rec(filename):
 15 |     """ Parse a PASCAL VOC xml file """
 16 |     tree = ET.parse(filename)
 17 |     objects = []
 18 |     for obj in tree.findall('object'):
 19 |         obj_struct = {}
 20 |         obj_struct['name'] = obj.find('name').text
 21 |         obj_struct['pose'] = obj.find('pose').text
 22 |         obj_struct['truncated'] = int(obj.find('truncated').text)
 23 |         obj_struct['difficult'] = int(obj.find('difficult').text)
 24 |         bbox = obj.find('bndbox')
 25 |         obj_struct['bbox'] = [int(bbox.find('xmin').text),
 26 |                               int(bbox.find('ymin').text),
 27 |                               int(bbox.find('xmax').text),
 28 |                               int(bbox.find('ymax').text)]
 29 |         objects.append(obj_struct)
 30 | 
 31 |     return objects
 32 | 
 33 | 
 34 | 
 35 | def voc_ap(rec, prec, use_07_metric=False):
 36 |     """ ap = voc_ap(rec, prec, [use_07_metric])
 37 |     Compute VOC AP given precision and recall.
 38 |     If use_07_metric is true, uses the
 39 |     VOC 07 11 point method (default:False).
 40 |     """
 41 |     if use_07_metric:
 42 |         # 11 point metric
 43 |         ap = 0.
 44 |         for t in np.arange(0., 1.1, 0.1):
 45 |             if np.sum(rec >= t) == 0:
 46 |                 p = 0
 47 |             else:
 48 |                 p = np.max(prec[rec >= t])
 49 |             ap = ap + p / 11.
 50 |     else:
 51 |         # correct AP calculation
 52 |         # first append sentinel values at the end
 53 |         mrec = np.concatenate(([0.], rec, [1.]))
 54 |         mpre = np.concatenate(([0.], prec, [0.]))
 55 | 
 56 |         # compute the precision envelope
 57 |         for i in range(mpre.size - 1, 0, -1):
 58 |             mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
 59 | 
 60 |         # to calculate area under PR curve, look for points
 61 |         # where X axis (recall) changes value
 62 |         i = np.where(mrec[1:] != mrec[:-1])[0]
 63 | 
 64 |         # and sum (\Delta recall) * prec
 65 |         ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
 66 |     return ap
 67 | 
 68 | def voc_eval(detpath,
 69 |              annopath,
 70 |              imagesetfile,
 71 |              classname,
 72 |              cachedir,
 73 |              ovthresh=0.5,
 74 |              use_07_metric=False):
 75 |     """rec, prec, ap = voc_eval(detpath,
 76 |                                 annopath,
 77 |                                 imagesetfile,
 78 |                                 classname,
 79 |                                 [ovthresh],
 80 |                                 [use_07_metric])
 81 | 
 82 |     Top level function that does the PASCAL VOC evaluation.
 83 | 
 84 |     detpath: Path to detections
 85 |         detpath.format(classname) should produce the detection results file.
 86 |     annopath: Path to annotations
 87 |         annopath.format(imagename) should be the xml annotations file.
 88 |     imagesetfile: Text file containing the list of images, one image per line.
 89 |     classname: Category name (duh)
 90 |     cachedir: Directory for caching the annotations
 91 |     [ovthresh]: Overlap threshold (default = 0.5)
 92 |     [use_07_metric]: Whether to use VOC07's 11 point AP computation
 93 |         (default False)
 94 |     """
 95 |     # assumes detections are in detpath.format(classname)
 96 |     # assumes annotations are in annopath.format(imagename)
 97 |     # assumes imagesetfile is a text file with each line an image name
 98 |     # cachedir caches the annotations in a pickle file
 99 | 
100 |     # first load gt
101 |     if not os.path.isdir(cachedir):
102 |         os.mkdir(cachedir)
103 |     cachefile = os.path.join(cachedir, imagesetfile.split(".")[0]+'_annots.pkl')
104 |     # read list of images
105 |     with open(imagesetfile, 'r') as f:
106 |         lines = f.readlines()
107 |     imagenames = [x.strip() for x in lines]
108 | 
109 |     if not os.path.isfile(cachefile):
110 |         # load annots
111 |         recs = {}
112 |         for i, imagename in enumerate(imagenames):
113 |             recs[imagename] = parse_rec(annopath.format(imagename))
114 |             if i % 100 == 0:
115 |                 print('Reading annotation for {:d}/{:d}'.format(
116 |                     i + 1, len(imagenames)))
117 |         # save
118 |         print('Saving cached annotations to {:s}'.format(cachefile))
119 |         with open(cachefile, 'wb') as f:
120 |             pickle.dump(recs, f)
121 |     else:
122 |         # load
123 |         with open(cachefile, 'rb') as f:
124 |             recs = pickle.load(f)
125 | 
126 |     # extract gt objects for this class
127 |     class_recs = {}
128 |     npos = 0
129 |     for imagename in imagenames:
130 |         R = [obj for obj in recs[imagename] if obj['name'] == classname]
131 |         bbox = np.array([x['bbox'] for x in R])
132 |         difficult = np.array([x['difficult'] for x in R]).astype(np.bool)
133 |         det = [False] * len(R)
134 |         npos = npos + sum(~difficult)
135 |         class_recs[imagename] = {'bbox': bbox,
136 |                                  'difficult': difficult,
137 |                                  'det': det}
138 | 
139 |     # read dets
140 |     detfile = detpath.format(classname)
141 |     with open(detfile, 'r') as f:
142 |         lines = f.readlines()
143 | 
144 |     splitlines = [x.strip().split(' ') for x in lines]
145 |     image_ids = [x[0] for x in splitlines]
146 |     confidence = np.array([float(x[1]) for x in splitlines])
147 |     BB = np.array([[float(z) for z in x[2:]] for x in splitlines])
148 |         # sort by confidence
149 |     sorted_ind = np.argsort(-confidence)
150 |     sorted_scores = np.sort(-confidence)
151 |     BB = BB[sorted_ind, :]
152 |     image_ids = [image_ids[x] for x in sorted_ind]
153 | 
154 |         # go down dets and mark TPs and FPs
155 |     nd = len(image_ids)
156 |     tp = np.zeros(nd)
157 |     fp = np.zeros(nd)
158 |     for d in range(nd):
159 |         R = class_recs[image_ids[d]]
160 |         bb = BB[d, :].astype(float)
161 |         ovmax = -np.inf
162 |         BBGT = R['bbox'].astype(float)
163 | 
164 |         if BBGT.size > 0:
165 |             # compute overlaps
166 |             # intersection
167 |             ixmin = np.maximum(BBGT[:, 0], bb[0])
168 |             iymin = np.maximum(BBGT[:, 1], bb[1])
169 |             ixmax = np.minimum(BBGT[:, 2], bb[2])
170 |             iymax = np.minimum(BBGT[:, 3], bb[3])
171 |             iw = np.maximum(ixmax - ixmin + 1., 0.)
172 |             ih = np.maximum(iymax - iymin + 1., 0.)
173 |             inters = iw * ih
174 | 
175 |                 # union
176 |             uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) +
177 |                    (BBGT[:, 2] - BBGT[:, 0] + 1.) *
178 |                    (BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)
179 | 
180 |             overlaps = inters / uni
181 |             ovmax = np.max(overlaps)
182 |             jmax = np.argmax(overlaps)
183 | 
184 |         if ovmax > ovthresh:
185 |             if not R['difficult'][jmax]:
186 |                 if not R['det'][jmax]:
187 |                     tp[d] = 1.
188 |                     R['det'][jmax] = 1
189 |                 else:
190 |                     fp[d] = 1.
191 |         else:
192 |             fp[d] = 1.
193 | 
194 |         # compute precision recall
195 |     fp = np.cumsum(fp)
196 |     tp = np.cumsum(tp)
197 |     rec = tp / float(npos)
198 |         # avoid divide by zero in case the first detection matches a difficult
199 |         # ground truth
200 |     prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
201 |     ap = voc_ap(rec, prec, use_07_metric)
202 | 
203 |     return rec, prec, ap
204 | 


--------------------------------------------------------------------------------
/demo.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | # Written by yq_yao
  3 | 
  4 | from __future__ import division
  5 | import time
  6 | import torch
  7 | import os
  8 | os.environ["CUDA_VISIBLE_DEVICES"] = "1" 
  9 | import torch.nn as nn
 10 | import torch.backends.cudnn as cudnn
 11 | import numpy as np
 12 | import cv2 
 13 | import argparse
 14 | import os.path as osp
 15 | import math
 16 | from model.yolo import Yolov3
 17 | from utils.box_utils import draw_rects, detection_postprecess
 18 | from data.config import voc_config, coco_config
 19 | from utils.preprocess import preproc_for_test
 20 | 
 21 | def arg_parse():
 22 |     """
 23 |     Parse arguements to the detect module
 24 |     """
 25 |     parser = argparse.ArgumentParser(description='YOLO v3 Detection Module')
 26 |    
 27 |     parser.add_argument("--images", dest = 'images', help = 
 28 |                         "Image / Directory containing images to perform detection upon",default = "images", type = str)
 29 |     parser.add_argument("--confidence", dest = "confidence", help = "Object Confidence to filter predictions", default = 0.1)
 30 |     parser.add_argument("--nms_thresh", dest = "nms_thresh", help = "NMS Threshhold", default = 0.4)
 31 |     parser.add_argument("--input_wh", dest = "input_wh", type=int, nargs=2, default = [416, 416])
 32 |     parser.add_argument("--save_path", dest = "save_path", help = "coco name path", default = './output')
 33 |     parser.add_argument("--dataset", dest = "dataset", help = "VOC or COCO", default = 'VOC')
 34 |     parser.add_argument("--weights", dest = 'weights',
 35 |                         help = "weightsfile",
 36 |                         default = "./weights/convert_yolov3_coco.pth", type = str)
 37 |     parser.add_argument('--cuda', default=True, type=str,
 38 |                     help='Use cuda to train model')
 39 |     parser.add_argument('--use_pad', default=True, type=str,
 40 |                     help='Use pad to resize images')
 41 |     return parser.parse_args()
 42 | 
 43 | 
 44 | if __name__ ==  '__main__':
 45 |     args = arg_parse()
 46 |     weightsfile = args.weights
 47 |     confidence = args.confidence
 48 |     nms_thresh = args.nms_thresh
 49 |     images = args.images
 50 |     input_wh = args.input_wh
 51 |     cuda = args.cuda
 52 |     use_pad = args.use_pad
 53 |     save_path = args.save_path
 54 |     dataset = args.dataset
 55 |     if dataset[0] == "V":
 56 |         cfg = voc_config
 57 |     elif dataset[1] == "C":
 58 |         cfg = coco_config
 59 |     else:
 60 |         print("only support VOC and COCO datasets !!!")
 61 |     name_path = cfg["name_path"]
 62 |     num_classes = cfg["num_classes"]
 63 |     anchors = cfg["anchors"]
 64 | 
 65 |     with open(name_path, "r") as f:
 66 |         classes = [i.strip() for i in f.readlines()]
 67 |     try:
 68 |         im_list = [osp.join(osp.realpath('.'), images, img) for img in os.listdir(images)]
 69 |     except NotADirectoryError:
 70 |         im_list = []
 71 |         im_list.append(osp.join(osp.realpath('.'), images))
 72 |     except FileNotFoundError:
 73 |         print ("No file or directory with the name {}".format(images))
 74 |         exit()
 75 | 
 76 |     net = Yolov3("test", input_wh, anchors, cfg["anchors_mask"], num_classes)
 77 |     state_dict = torch.load(weightsfile)
 78 |     from collections import OrderedDict
 79 |     new_state_dict = OrderedDict()
 80 |     for k, v in state_dict.items():
 81 |         head = k[:7]
 82 |         if head == 'module.':
 83 |             name = k[7:] # remove `module.`
 84 |         else:
 85 |             name = k
 86 |         new_state_dict[name] = v
 87 |     if cuda:
 88 |         net.cuda()
 89 |         cudnn.benchmark = True
 90 |     net.load_state_dict(new_state_dict)
 91 |     print("load weights successfully.....")
 92 |     net.eval()
 93 |     for img_path in im_list[:]:
 94 |         print(img_path)
 95 |         img = cv2.imread(img_path)
 96 |         ori_img = img.copy()
 97 |         ori_wh = (img.shape[1], img.shape[0])
 98 |         img = preproc_for_test(img, input_wh, use_pad)
 99 |         if cuda:
100 |             img = img.cuda()
101 |         st = time.time()
102 |         detection = net(img)
103 |         detect_time = time.time()
104 |         detection = detection_postprecess(detection, confidence, num_classes, input_wh, ori_wh, use_pad=use_pad, nms_conf=nms_thresh)
105 |         nms_time = time.time()
106 |         draw_img = draw_rects(ori_img, detection, classes)
107 |         draw_time = time.time()
108 |         save_img_path = os.path.join(save_path, "output_" + img_path.split("/")[-1])
109 |         cv2.imwrite(save_img_path, draw_img)
110 |         final_time = time.time() - st
111 | 
112 |         print("detection time:", round(detect_time - st, 3), "nms_time:", round(nms_time - detect_time, 3), "draw_time:", round(draw_time - nms_time, 3), "final_time:", round(final_time ,3))
113 | 
114 | 
115 | 
116 | 
117 | 
118 | 
119 | 
120 | 
121 | 
122 | 


--------------------------------------------------------------------------------
/eval.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | # Written by yq_yao
  3 | 
  4 | from __future__ import division
  5 | import time
  6 | import torch
  7 | import os
  8 | os.environ["CUDA_VISIBLE_DEVICES"] = "0" 
  9 | import torch.nn as nn
 10 | from torch.autograd import Variable
 11 | import torch.backends.cudnn as cudnn
 12 | import numpy as np
 13 | import cv2 
 14 | import argparse
 15 | import os.path as osp
 16 | import math
 17 | import pickle
 18 | from model.yolo import Yolov3
 19 | from data.voc0712 import  VOCDetection, detection_collate
 20 | from data.coco import COCODetection
 21 | from data.config import voc_config, coco_config, datasets_dict
 22 | from utils.box_utils import draw_rects, detection_postprecess
 23 | from utils.timer import Timer
 24 | from utils.preprocess import preproc_for_test
 25 | 
 26 | def arg_parse():
 27 |     """
 28 |     Parse arguements to the detect module
 29 |     
 30 |     """
 31 |     parser = argparse.ArgumentParser(description='YOLO v3 Detection Module')
 32 |    
 33 |     parser.add_argument('--dataset', default='VOC',
 34 |                     help='VOC ,VOC0712++ or COCO dataset')
 35 |     parser.add_argument("--nms_thresh", dest = "nms_thresh", help = "NMS Threshhold", default = 0.4)
 36 |     parser.add_argument("--input_wh", dest = "input_wh", type=int, nargs=2, default = [416, 416])
 37 |     parser.add_argument("--weights", dest = 'weights',
 38 |                         help = "weightsfile",
 39 |                         default = "./weights/yolov3_COCO_epoches_10_0607.pth", type = str)
 40 |     parser.add_argument('--cuda', default=True, type=str,
 41 |                     help='Use cuda to train model')
 42 |     parser.add_argument('--use_pad', default=True, type=str,
 43 |                     help='Use pad to resize images')
 44 |     parser.add_argument('--retest', default=False, type=bool,
 45 |                     help='test cache results')
 46 |     parser.add_argument('--save_folder', default='./eval/',
 47 |                         help='results path')
 48 |     return parser.parse_args()
 49 | 
 50 | def test_net(cfg, save_folder, input_wh, net, cuda, testset,
 51 |              max_per_image=300, thresh=0.05, nms_conf=0.4):
 52 |     """Test a Fast R-CNN network on an image database."""
 53 |     num_images = len(testset)
 54 |     # all detections are collected into:
 55 |     #    all_boxes[cls][image] = N x 5 array of detections in
 56 |     #    (x1, y1, x2, y2, score)
 57 |     num_images = len(testset)
 58 |     num_classes = cfg["num_classes"]
 59 |     all_boxes = [[[] for _ in range(num_images)]
 60 |                  for _ in range(num_classes)]
 61 | 
 62 |     if not os.path.exists(save_folder):
 63 |         os.mkdir(save_folder)
 64 |     # timers
 65 |     _t = {'im_detect': Timer(), 'misc': Timer()}
 66 |     det_file = os.path.join(save_folder, 'detections.pkl')
 67 | 
 68 |     if args.retest:
 69 |         f = open(det_file,'rb')
 70 |         all_boxes = pickle.load(f)
 71 |         print('Evaluating detections')
 72 |         testset.evaluate_detections(all_boxes, save_folder)
 73 |         return
 74 |         
 75 |     for i in range(num_images):
 76 |         img, img_id = testset.pull_image(i)
 77 |         ori_wh = (img.shape[1], img.shape[0])
 78 |         img = preproc_for_test(img, input_wh, use_pad)
 79 |         x = img
 80 |         if cuda:
 81 |             x = x.cuda()
 82 | 
 83 |         _t['im_detect'].tic()
 84 |         out = net(x)      # forward pass
 85 |         detections = detection_postprecess(out, thresh, num_classes, input_wh, ori_wh, use_pad=use_pad, nms_conf=nms_conf)
 86 |         boxes, scores, cls_inds = detections[:, :4], detections[:,4], detections[:, -1]
 87 |         detect_time = _t['im_detect'].toc()
 88 |         if len(boxes) == 0:
 89 |             continue
 90 | 
 91 |         _t['misc'].tic()        
 92 |         for j in range(num_classes):
 93 |             inds = np.where(cls_inds == j)[0]
 94 |             if len(inds) == 0:
 95 |                 all_boxes[j][i] = np.empty([0, 5], dtype=np.float32)
 96 |                 continue
 97 |             c_bboxes = boxes[inds]
 98 |             c_scores = scores[inds]
 99 |             c_dets = np.hstack((c_bboxes, c_scores[:, np.newaxis])).astype(
100 |                 np.float32, copy=False)
101 |             all_boxes[j][i] = c_dets
102 | 
103 |         if max_per_image > 0:
104 |             image_scores = np.hstack([all_boxes[j][i][:, -1] for j in range(num_classes)])
105 |             if len(image_scores) > max_per_image:
106 |                 image_thresh = np.sort(image_scores)[-max_per_image]
107 |                 for j in range(num_classes):
108 |                     keep = np.where(all_boxes[j][i][:, -1] >= image_thresh)[0]
109 |                     all_boxes[j][i] = all_boxes[j][i][keep, :]
110 |         nms_time = _t['misc'].toc()
111 |  
112 |         if i % 20 == 0:
113 |             print('im_detect: {:d}/{:d} {:.3f}s {:.3f}s'
114 |                 .format(i + 1, num_images, detect_time, nms_time))
115 |             _t['im_detect'].clear()
116 |             _t['misc'].clear()
117 | 
118 |     with open(det_file, 'wb') as f:
119 |         pickle.dump(all_boxes, f, pickle.HIGHEST_PROTOCOL)
120 |     print('Evaluating detections')
121 |     testset.evaluate_detections(all_boxes, save_folder)
122 | 
123 | if __name__ ==  '__main__':
124 |     args = arg_parse()
125 |     weightsfile = args.weights
126 |     nms_thresh = args.nms_thresh
127 |     input_wh = args.input_wh
128 |     cuda = args.cuda
129 |     use_pad = args.use_pad
130 |     save_folder = args.save_folder
131 |     dataset = args.dataset
132 |     if dataset[0] == "V":
133 |         cfg = voc_config
134 |         test_dataset = VOCDetection(cfg["root"], datasets_dict["VOC2007"], input_wh)
135 |     elif dataset[0] == "C":
136 |         cfg =  coco_config
137 |         test_dataset = COCODetection(cfg["root"], datasets_dict["COCOval"], input_wh) 
138 |     else:
139 |         print("only support VOC and COCO datasets !!!")
140 | 
141 |     print("load test_dataset successfully.....")
142 | 
143 |     with open(cfg["name_path"], "r") as f:
144 |         classes = [i.strip() for i in f.readlines()]
145 | 
146 |     net = Yolov3("test", input_wh, cfg["anchors"], cfg["anchors_mask"], cfg["num_classes"])
147 |     state_dict = torch.load(weightsfile)
148 |     from collections import OrderedDict
149 |     new_state_dict = OrderedDict()
150 |     for k, v in state_dict.items():
151 |         head = k[:7]
152 |         if head == 'module.':
153 |             name = k[7:] # remove `module.`
154 |         else:
155 |             name = k
156 |         new_state_dict[name] = v
157 | 
158 |     if cuda:
159 |         net.cuda()
160 |         cudnn.benchmark = True
161 |     net.load_state_dict(new_state_dict)
162 |     print("load weights successfully.....")
163 |     net.eval()
164 | 
165 |     top_k = 200
166 |     confidence = 0.01
167 |     test_net(cfg, save_folder, input_wh, net, args.cuda, test_dataset, top_k, confidence, nms_thresh)
168 | 
169 | 
170 | 
171 | 
172 | 
173 | 
174 | 
175 | 
176 | 


--------------------------------------------------------------------------------
/images/dog.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/images/dog.jpg


--------------------------------------------------------------------------------
/images/eagle.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/images/eagle.jpg


--------------------------------------------------------------------------------
/images/person.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/images/person.jpg


--------------------------------------------------------------------------------
/layers/multiyolo_loss.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | # Written by yq_yao
  3 | # 
  4 | import torch
  5 | import torch.nn as nn
  6 | import torch.nn.functional as F
  7 | from torch.autograd import Variable
  8 | import numpy as np
  9 | from .weight_mseloss import WeightMseLoss
 10 | from utils.box_utils import targets_match_all, permute_sigmoid, decode
 11 | 
 12 | class MultiYoloLoss(nn.Module):
 13 | 
 14 |     def __init__(self, input_wh, num_classes, ignore_thresh, anchors, anchors_mask, use_gpu=True):
 15 |         super(MultiYoloLoss, self).__init__()
 16 |         self.num_classes = num_classes
 17 |         self.ignore_thresh = ignore_thresh
 18 |         self.use_gpu = use_gpu
 19 |         self.anchors = anchors
 20 |         self.mse_loss = nn.MSELoss(size_average=False)
 21 |         self.bce_loss = nn.BCELoss(size_average=False)
 22 |         self.weight_mseloss = WeightMseLoss(size_average=False)
 23 |         self.input_wh = input_wh
 24 |         self.anchors_mask = anchors_mask
 25 | 
 26 |     def forward(self, x, targets, input_wh, debug=False):
 27 |         self.input_wh = input_wh
 28 |         l_data, m_data, h_data = x
 29 |         l_grid_wh = (l_data.size(3), l_data.size(2))
 30 |         m_grid_wh = (m_data.size(3), m_data.size(2))
 31 |         h_grid_wh = (h_data.size(3), h_data.size(2))
 32 |         feature_dim = (l_grid_wh, m_grid_wh, h_grid_wh)
 33 |         batch_size = l_data.size(0)
 34 |         pred_l, stride_l = permute_sigmoid(l_data, self.input_wh, 3, self.num_classes)
 35 |         pred_m, stride_m = permute_sigmoid(m_data, self.input_wh, 3, self.num_classes)
 36 |         pred_h, stride_h = permute_sigmoid(h_data, self.input_wh, 3, self.num_classes)
 37 |         pred = torch.cat((pred_l, pred_m, pred_h), 1)
 38 | 
 39 |         anchors1 = self.anchors[self.anchors_mask[0][0]: self.anchors_mask[0][-1]+1]
 40 |         anchors2 = self.anchors[self.anchors_mask[1][0]: self.anchors_mask[1][-1]+1]
 41 |         anchors3 = self.anchors[self.anchors_mask[2][0]: self.anchors_mask[2][-1]+1]
 42 | 
 43 |         decode_l = decode(pred_l.new_tensor(pred_l).detach(), self.input_wh, anchors1, self.num_classes, stride_l)
 44 |         decode_m = decode(pred_m.new_tensor(pred_m).detach(), self.input_wh, anchors2, self.num_classes, stride_m)
 45 |         decode_h = decode(pred_h.new_tensor(pred_h).detach(), self.input_wh, anchors3, self.num_classes, stride_h)
 46 |         decode_pred = torch.cat((decode_l, decode_m, decode_h), 1)
 47 | 
 48 |         num_pred = pred_l.size(1) + pred_m.size(1) + pred_h.size(1)
 49 | 
 50 |         # prediction targets x,y,w,h,objectness, class
 51 |         pred_t = torch.Tensor(batch_size, num_pred, 6).cuda()
 52 |         # xywh scale, scale = 2 - truth.w * truth.h (if truth is normlized to 1)
 53 |         scale_t = torch.FloatTensor(batch_size, num_pred).cuda()
 54 |         # foreground targets mask
 55 |         fore_mask_t = torch.ByteTensor(batch_size, num_pred).cuda()
 56 | 
 57 |         # background targets mask, we only calculate the objectness pred loss 
 58 |         back_mask_t = torch.ByteTensor(batch_size, num_pred).cuda()
 59 | 
 60 |         for idx in range(batch_size):
 61 |             # match all targets
 62 |             targets_match_all(self.input_wh, self.ignore_thresh, targets[idx], decode_pred[idx][:, :4], self.anchors, feature_dim, pred_t, scale_t, fore_mask_t, back_mask_t, num_pred, idx)
 63 | 
 64 |         scale_factor = scale_t[fore_mask_t].view(-1, 1)
 65 |         scale_factor = scale_factor.expand((scale_factor.size(0), 2))
 66 |         cls_t = pred_t[..., 5][fore_mask_t].long().view(-1, 1)
 67 |         cls_pred = pred[..., 5:]
 68 | 
 69 |         # cls loss
 70 |         cls_fore_mask_t = fore_mask_t.new_tensor(fore_mask_t).view(batch_size, num_pred, 1).expand_as(cls_pred)
 71 |         cls_pred = cls_pred[cls_fore_mask_t].view(-1, self.num_classes)
 72 |         class_mask = cls_pred.data.new(cls_t.size(0), self.num_classes).fill_(0)
 73 |         class_mask.scatter_(1, cls_t, 1.)
 74 |         cls_loss = self.bce_loss(cls_pred, class_mask)
 75 |         ave_cls = (class_mask * cls_pred).sum().item() / cls_pred.size(0)
 76 |         
 77 |         # conf loss
 78 |         conf_t = pred_t[..., 4]
 79 |         fore_conf_t = conf_t[fore_mask_t].view(-1, 1)
 80 |         back_conf_t = conf_t[back_mask_t].view(-1, 1)
 81 |         fore_conf_pred = pred[..., 4][fore_mask_t].view(-1, 1)
 82 |         back_conf_pred = pred[..., 4][back_mask_t].view(-1, 1)
 83 |         fore_num = fore_conf_pred.size(0)
 84 |         back_num = back_conf_pred.size(0)
 85 |         Obj = fore_conf_pred.sum().item() / fore_num
 86 |         no_obj = back_conf_pred.sum().item() / back_num
 87 | 
 88 |         fore_conf_loss = self.bce_loss(fore_conf_pred, fore_conf_t)
 89 |         back_conf_loss = self.bce_loss(back_conf_pred, back_conf_t)
 90 |         conf_loss = fore_conf_loss + back_conf_loss  
 91 | 
 92 |         # loc loss
 93 |         loc_pred = pred[..., :4]
 94 |         loc_t = pred_t[..., :4]
 95 |         fore_mask_t = fore_mask_t.view(batch_size, num_pred, 1).expand_as(loc_pred)
 96 |         loc_t = loc_t[fore_mask_t].view(-1, 4)
 97 |         loc_pred = loc_pred[fore_mask_t].view(-1, 4)
 98 | 
 99 |         xy_t, wh_t = loc_t[:, :2], loc_t[:, 2:]
100 |         xy_pred, wh_pred = loc_pred[:, :2], loc_pred[:, 2:]
101 |         # xy_loss = F.binary_cross_entropy(xy_pred, xy_t, scale_factor, size_average=False)
102 | 
103 |         xy_loss = self.weight_mseloss(xy_pred, xy_t, scale_factor) / 2
104 |         wh_loss = self.weight_mseloss(wh_pred, wh_t, scale_factor) / 2
105 | 
106 |         loc_loss = xy_loss + wh_loss        
107 | 
108 |         loc_loss /= batch_size
109 |         conf_loss /= batch_size
110 |         cls_loss /= batch_size
111 | 
112 |         if debug:
113 |             print("xy_loss", round(xy_loss.item(), 5), "wh_loss", round(wh_loss.item(), 5), "cls_loss", round(cls_loss.item(), 5), "ave_cls", round(ave_cls, 5), "Obj", round(Obj, 5), "no_obj", round(no_obj, 5), "fore_conf_loss", round(fore_conf_loss.item(), 5),
114 |                 "back_conf_loss", round(back_conf_loss.item(), 5))
115 | 
116 |         loss = loc_loss + conf_loss + cls_loss
117 | 
118 |         return loss
119 | 
120 | 
121 | 
122 | 
123 | 
124 | 
125 | 
126 | 


--------------------------------------------------------------------------------
/layers/weight_mseloss.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # Written by yq_yao
 3 | 
 4 | 
 5 | import torch
 6 | import torch.nn as nn
 7 | import torch.nn.functional as F
 8 | from torch.autograd import Variable
 9 | 
10 | 
11 | class WeightMseLoss(nn.Module):
12 |     def __init__(self, size_average=True):
13 |         super(WeightMseLoss, self).__init__()
14 |         self.size_average = size_average
15 | 
16 |     def forward(self, inputs, targets, weights):
17 |         ''' inputs is N * C
18 |             targets is N * C
19 |             weights is N * C
20 |         '''
21 |         N = inputs.size(0)
22 |         C = inputs.size(1)
23 | 
24 |         out = (targets - inputs)
25 |         out = weights * torch.pow(out, 2)
26 |         loss = out.sum()
27 | 
28 |         if self.size_average:
29 |             loss = loss / (N * C)
30 |         return loss
31 | 


--------------------------------------------------------------------------------
/layers/yolo_layer.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | # Written by yq_yao
  3 | # 
  4 | import torch
  5 | import torch.nn as nn
  6 | import torch.nn.functional as F
  7 | from torch.autograd import Variable
  8 | import numpy as np
  9 | import math
 10 | from .weight_mseloss import WeightMseLoss
 11 | from utils.box_utils import targets_match_single, permute_sigmoid, decode
 12 | 
 13 | class YoloLayer(nn.Module):
 14 | 
 15 |     def __init__(self, input_wh, num_classes, ignore_thresh, anchors, anchors_mask,use_gpu=True):
 16 |         super(YoloLayer, self).__init__()
 17 |         self.num_classes = num_classes
 18 |         self.ignore_thresh = ignore_thresh
 19 |         self.use_gpu = use_gpu
 20 |         self.anchors = anchors
 21 |         self.anchors_mask = anchors_mask
 22 |         self.input_wh = input_wh
 23 |         self.mse_loss = nn.MSELoss(size_average=False)
 24 |         self.bce_loss = nn.BCELoss(size_average=False)
 25 |         self.weight_mseloss = WeightMseLoss(size_average=False)
 26 | 
 27 |     def forward(self, x, targets, input_wh, debug=False):
 28 |         self.input_wh = input_wh
 29 |         batch_size = x.size(0)
 30 |         # feature map size w, h, this produce wxh cells to predict
 31 |         grid_wh = (x.size(3), x.size(2))
 32 |         x, stride = permute_sigmoid(x, input_wh, 3, self.num_classes)
 33 |         pred = x
 34 |         num_pred = pred.size(1)
 35 | 
 36 |         decode_pred = decode(pred.new_tensor(pred).detach(), self.input_wh, self.anchors[self.anchors_mask[0]: self.anchors_mask[-1]+1], self.num_classes, stride)
 37 | 
 38 |         # prediction targets x,y,w,h,objectness, class
 39 |         pred_t = torch.Tensor(batch_size, num_pred, 6).cuda()
 40 |         # xywh scale, scale = 2 - truth.w * truth.h (if truth is normlized to 1)
 41 |         scale_t = torch.FloatTensor(batch_size, num_pred).cuda()
 42 |         # foreground targets mask
 43 |         fore_mask_t = torch.ByteTensor(batch_size, num_pred).cuda()
 44 | 
 45 |         # background targets mask, we only calculate the objectness pred loss 
 46 |         back_mask_t = torch.ByteTensor(batch_size, num_pred).cuda()
 47 | 
 48 |         for idx in range(batch_size):
 49 |             # match our targets
 50 |             targets_match_single(self.input_wh, self.ignore_thresh, targets[idx], decode_pred[idx][:, :4], self.anchors, self.anchors_mask, pred_t, scale_t, fore_mask_t, back_mask_t, grid_wh, idx)
 51 | 
 52 |         cls_t = pred_t[..., 5][fore_mask_t].long().view(-1, 1)
 53 |         cls_pred = pred[..., 5:]
 54 |         conf_t = pred_t[..., 4]
 55 |         if cls_t.size(0) == 0:
 56 |             print("grid_wh {} no matching anchors".format(grid_wh))
 57 |             back_conf_t = conf_t[back_mask_t].view(-1, 1)
 58 |             back_conf_pred = pred[..., 4][back_mask_t].view(-1, 1)
 59 |             back_num = back_conf_pred.size(0)
 60 |             no_obj = back_conf_pred.sum().item() / back_num
 61 |             back_conf_loss = self.bce_loss(back_conf_pred, back_conf_t)
 62 |             if debug:
 63 |                 print("grid_wh", grid_wh, "loc_loss", 0, "conf_loss", round(back_conf_loss.item(), 5), "cls_loss", 0, "Obj", 0, "no_obj", round(no_obj, 5))
 64 |             return torch.zeros(1), back_conf_loss, torch.zeros(1)
 65 | 
 66 |         scale_factor = scale_t[fore_mask_t].view(-1, 1)
 67 |         scale_factor = scale_factor.expand((scale_factor.size(0), 2))
 68 | 
 69 |         # cls loss
 70 |         cls_fore_mask_t = fore_mask_t.new_tensor(fore_mask_t).view(batch_size, num_pred, 1).expand_as(cls_pred)
 71 |         cls_pred = cls_pred[cls_fore_mask_t].view(-1, self.num_classes)
 72 |         class_mask = cls_pred.data.new(cls_t.size(0), self.num_classes).fill_(0)
 73 |         class_mask.scatter_(1, cls_t, 1.)
 74 |         cls_loss = self.bce_loss(cls_pred, class_mask)
 75 |         ave_cls = (class_mask * cls_pred).sum().item() / cls_pred.size(0)
 76 |         
 77 |         # conf loss
 78 |         fore_conf_t = conf_t[fore_mask_t].view(-1, 1)
 79 |         back_conf_t = conf_t[back_mask_t].view(-1, 1)
 80 |         fore_conf_pred = pred[..., 4][fore_mask_t].view(-1, 1)
 81 |         back_conf_pred = pred[..., 4][back_mask_t].view(-1, 1)
 82 |         fore_num = fore_conf_pred.size(0)
 83 |         back_num = back_conf_pred.size(0)
 84 |         Obj = fore_conf_pred.sum().item() / fore_num
 85 |         no_obj = back_conf_pred.sum().item() / back_num
 86 | 
 87 |         fore_conf_loss = self.bce_loss(fore_conf_pred, fore_conf_t)
 88 |         back_conf_loss = self.bce_loss(back_conf_pred, back_conf_t)
 89 |         conf_loss = fore_conf_loss + back_conf_loss  
 90 | 
 91 |         # loc loss
 92 |         loc_pred = pred[..., :4]
 93 |         loc_t = pred_t[..., :4]
 94 |         fore_mask_t = fore_mask_t.view(batch_size, num_pred, 1).expand_as(loc_pred)
 95 |         loc_t = loc_t[fore_mask_t].view(-1, 4)
 96 |         loc_pred = loc_pred[fore_mask_t].view(-1, 4)
 97 | 
 98 |         xy_t, wh_t = loc_t[:, :2], loc_t[:, 2:]
 99 |         xy_pred, wh_pred = loc_pred[:, :2], loc_pred[:, 2:]
100 |         # xy_loss = F.binary_cross_entropy(xy_pred, xy_t, scale_factor, size_average=False)
101 | 
102 |         xy_loss = self.weight_mseloss(xy_pred, xy_t, scale_factor) / 2
103 |         wh_loss = self.weight_mseloss(wh_pred, wh_t, scale_factor) / 2
104 | 
105 |         loc_loss = xy_loss + wh_loss        
106 | 
107 |         loc_loss /= batch_size
108 |         conf_loss /= batch_size
109 |         cls_loss /= batch_size
110 | 
111 |         if debug:
112 |             print("grid_wh", grid_wh, "xy_loss", round(xy_loss.item(), 5), "wh_loss", round(wh_loss.item(), 5), "cls_loss", round(cls_loss.item(), 5), "ave_cls", round(ave_cls, 5), "Obj", round(Obj, 5), "no_obj", round(no_obj, 5), "fore_conf_loss", round(fore_conf_loss.item(), 5),
113 |                 "back_conf_loss", round(back_conf_loss.item(), 5))
114 | 
115 |         return loc_loss, conf_loss, cls_loss
116 | 
117 | 
118 | 
119 | 
120 | 
121 | 
122 | 
123 | 
124 | 
125 |  
126 | 
127 | 
128 | 
129 | 
130 | 
131 | 
132 | 
133 | 


--------------------------------------------------------------------------------
/layers/yolo_loss.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # Written by yq_yao
 3 | # 
 4 | import torch
 5 | import torch.nn as nn
 6 | import torch.nn.functional as F
 7 | from torch.autograd import Variable
 8 | import torch.nn.init as init
 9 | import os
10 | from layers.yolo_layer import YoloLayer
11 | 
12 | 
13 | class YoloLoss(nn.Module):
14 |     def __init__(self, input_wh, num_classes, ignore_thresh, anchors, anchors_mask, use_gpu=True):
15 |         super(YoloLoss, self).__init__()
16 |         self.input_wh = input_wh
17 |         self.num_classes = num_classes
18 |         self.ignore_thresh = ignore_thresh
19 |         self.use_gpu = use_gpu
20 |         self.anchors = anchors
21 |         self.anchors_mask = anchors_mask
22 |         self.yolo_layer1 = YoloLayer(input_wh, num_classes, ignore_thresh, anchors, anchors_mask[0])
23 |         self.yolo_layer2 = YoloLayer(input_wh, num_classes, ignore_thresh, anchors, anchors_mask[1])
24 |         self.yolo_layer3 = YoloLayer(input_wh, num_classes, ignore_thresh, anchors, anchors_mask[2])      
25 | 
26 |     def forward(self, inputs, targets, input_wh, debug):
27 |         self.input_wh = input_wh
28 |         x, y, z = inputs
29 |         batch_size = x.size(0)
30 |         loc_loss1, conf_loss1, cls_loss1 = self.yolo_layer1(x, targets, self.input_wh, debug)
31 |         loc_loss2, conf_loss2, cls_loss2 = self.yolo_layer2(y, targets, self.input_wh, debug)
32 |         loc_loss3, conf_loss3, cls_loss3 = self.yolo_layer3(z, targets, self.input_wh, debug)
33 |         loc_loss = loc_loss1 + loc_loss2 + loc_loss3
34 |         conf_loss = conf_loss1 + conf_loss2 + conf_loss3
35 |         cls_loss = cls_loss1 + cls_loss2 + cls_loss3
36 |         loss = loc_loss + conf_loss + cls_loss
37 |         return loss


--------------------------------------------------------------------------------
/make.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | cd ./utils/
 3 | 
 4 | CUDA_PATH=/usr/local/cuda/
 5 | 
 6 | python build.py build_ext --inplace
 7 | # if you use anaconda3 maybe you need add this
 8 | mv nms/cpu_nms.cpython-36m-x86_64-linux-gnu.so nms/cpu_nms.so
 9 | mv nms/gpu_nms.cpython-36m-x86_64-linux-gnu.so nms/gpu_nms.so
10 | cd ..
11 | 


--------------------------------------------------------------------------------
/model/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/model/__init__.py


--------------------------------------------------------------------------------
/model/darknet53.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | # Written by yq_yao
  3 | # 
  4 | import torch
  5 | import torch.nn as nn
  6 | import torch.nn.functional as F
  7 | from torch.autograd import Variable
  8 | 
  9 | class ConvBN(nn.Module):
 10 |     def __init__(self, ch_in, ch_out, kernel_size=3, stride=1, padding=0):
 11 |         super().__init__()
 12 |         self.conv = nn.Conv2d(ch_in, ch_out, kernel_size=kernel_size, stride=stride, padding=padding, bias=False)
 13 |         self.bn = nn.BatchNorm2d(ch_out, momentum=0.01, eps=1e-05, affine=True)
 14 | 
 15 |     def forward(self, x):
 16 |         return F.leaky_relu(self.bn(self.conv(x)), negative_slope=0.1, inplace=True)
 17 | 
 18 | class DarknetBlock(nn.Module):
 19 |     def __init__(self, ch_in):
 20 |         super().__init__()
 21 |         ch_hid = ch_in // 2
 22 |         self.conv1 = ConvBN(ch_in, ch_hid, kernel_size=1, stride=1, padding=0)
 23 |         self.conv2 = ConvBN(ch_hid, ch_in, kernel_size=3, stride=1, padding=1)
 24 | 
 25 |     def forward(self, x):
 26 |         out = self.conv1(x)
 27 |         out = self.conv2(out)
 28 |         return out + x
 29 | 
 30 | class Darknet19(nn.Module):
 31 |     def __init__(self, size):
 32 |         super().__init__()
 33 |         self.conv = ConvBN(3, 32, kernel_size=3, stride=1, padding=1)
 34 |         self.layer1 = self._make_layer1()
 35 |         self.layer2 = self._make_layer2()
 36 |         self.layer3 = self._make_layer3()
 37 |         self.layer4 = self._make_layer4()
 38 |         self.layer5 = self._make_layer5()
 39 | 
 40 |     def _make_layer1(self):
 41 |         layers = [nn.MaxPool2d(kernel_size=2, stride=2),
 42 |                     ConvBN(32, 64, kernel_size=3, stride=1, padding=1)]
 43 |         return nn.Sequential(*layers)
 44 |         
 45 |     def _make_layer2(self):
 46 |         layers = [nn.MaxPool2d(kernel_size=2, stride=2),
 47 |                   ConvBN(64, 128, kernel_size=3, stride=1, padding=1),
 48 |                   ConvBN(128, 64, kernel_size=1, stride=1, padding=1),
 49 |                   ConvBN(64, 128, kernel_size=3, stride=1, padding=1)]
 50 |         return nn.Sequential(*layers)
 51 | 
 52 |     def _make_layer3(self):
 53 |         layers = [nn.MaxPool2d(kernel_size=2, stride=2),
 54 |                   ConvBN(128, 256, kernel_size=3, stride=1, padding=1),
 55 |                   ConvBN(256, 128, kernel_size=1, stride=1, padding=1),
 56 |                   ConvBN(128, 256, kernel_size=3, stride=1, padding=1)]
 57 |         return nn.Sequential(*layers)
 58 | 
 59 |     def _make_layer4(self):
 60 |         layers = [nn.MaxPool2d(kernel_size=2, stride=2),
 61 |                   ConvBN(256, 512, kernel_size=3, stride=1, padding=1),
 62 |                   ConvBN(512, 256, kernel_size=1, stride=1, padding=1),
 63 |                   ConvBN(256, 512, kernel_size=3, stride=1, padding=1),
 64 |                   ConvBN(512, 256, kernel_size=1, stride=1, padding=1),
 65 |                   ConvBN(256, 512, kernel_size=3, stride=1, padding=1)]
 66 |         return nn.Sequential(*layers)        
 67 | 
 68 |     def _make_layer5(self):
 69 |         layers = [nn.MaxPool2d(kernel_size=2, stride=2),
 70 |                   ConvBN(512, 1024, kernel_size=3, stride=1, padding=1),
 71 |                   ConvBN(1024, 512, kernel_size=1, stride=1, padding=1),
 72 |                   ConvBN(512, 1024, kernel_size=3, stride=1, padding=1),
 73 |                   ConvBN(1024, 512, kernel_size=1, stride=1, padding=1),
 74 |                   ConvBN(512, 1024, kernel_size=3, stride=1, padding=1)]
 75 |         return nn.Sequential(*layers) 
 76 | 
 77 |     def forward(self, x):
 78 |         out = self.conv(x)
 79 | 
 80 |         c1 = self.layer1(out)
 81 |         c2 = self.layer2(c1)
 82 |         c3 = self.layer3(c2)
 83 |         c4 = self.layer4(c3)
 84 |         c5 = self.layer5(c4)
 85 |         return (c3, c4, c5)
 86 | 
 87 | 
 88 | class Darknet53(nn.Module):
 89 |     def __init__(self, num_blocks):
 90 |         super().__init__()
 91 |         self.conv = ConvBN(3, 32, kernel_size=3, stride=1, padding=1)
 92 |         self.layer1 = self._make_layer(32, num_blocks[0], stride=2)
 93 |         self.layer2 = self._make_layer(64, num_blocks[1], stride=2)
 94 |         self.layer3 = self._make_layer(128, num_blocks[2], stride=2)
 95 |         self.layer4 = self._make_layer(256, num_blocks[3], stride=2)
 96 |         self.layer5 = self._make_layer(512, num_blocks[4], stride=2)
 97 | 
 98 |     def _make_layer(self, ch_in, num_blocks, stride=1):
 99 |         layers = [ConvBN(ch_in, ch_in*2, stride=stride, padding=1)]
100 |         for i in range(num_blocks):
101 |             layers.append(DarknetBlock(ch_in * 2))
102 |         return nn.Sequential(*layers) 
103 | 
104 |     def forward(self, x):
105 |         out = self.conv(x)
106 |         c1 = self.layer1(out)
107 |         c2 = self.layer2(c1)
108 |         c3 = self.layer3(c2)
109 |         c4 = self.layer4(c3)
110 |         c5 = self.layer5(c4)
111 |         return (c3, c4, c5)
112 | 
113 | 


--------------------------------------------------------------------------------
/model/yolo.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | # Written by yq_yao
  3 | # 
  4 | import torch
  5 | import torch.nn as nn
  6 | import torch.nn.functional as F
  7 | from torch.autograd import Variable
  8 | import torch.nn.init as init
  9 | from model.darknet53 import Darknet53
 10 | import os
 11 | from utils.box_utils import permute_sigmoid, decode
 12 | from layers.yolo_layer import YoloLayer
 13 | 
 14 | def xavier(param):
 15 |     init.xavier_uniform(param)
 16 | 
 17 | # kaiming_weights_init
 18 | def weights_init(m):
 19 |     for key in m.state_dict():
 20 |         if key.split('.')[-1] == 'weight':
 21 |             if 'conv' in key:
 22 |                 init.kaiming_normal_(m.state_dict()[key], mode='fan_out')
 23 |             if 'bn' in key:
 24 |                 m.state_dict()[key][...] = 1
 25 |         elif key.split('.')[-1] == 'bias':
 26 |             m.state_dict()[key][...] = 0
 27 | 
 28 | 
 29 | # def weights_init(m):
 30 | #     for key in m.state_dict():
 31 | #         if key.split('.')[-1] == 'weight':
 32 | #             if 'conv' in key:
 33 | #                 init.xavier_uniform(m.state_dict()[key])
 34 | #             if 'bn' in key:
 35 | #                 m.state_dict()[key][...] = 1
 36 | #         elif key.split('.')[-1] == 'bias':
 37 | #             m.state_dict()[key][...] = 0
 38 | 
 39 | class ConvBN(nn.Module):
 40 |     def __init__(self, ch_in, ch_out, kernel_size=3, stride=1, padding=0):
 41 |         super().__init__()
 42 |         self.conv = nn.Conv2d(ch_in, ch_out, kernel_size=kernel_size, stride=stride, padding=padding, bias=False)
 43 |         self.bn = nn.BatchNorm2d(ch_out, momentum=0.01)
 44 | 
 45 |     def forward(self, x):
 46 |         return F.leaky_relu(self.bn(self.conv(x)), negative_slope=0.1, inplace=True)
 47 | 
 48 | class DetectionLayer(nn.Module):
 49 |     def __init__(self, anchors, anchors_mask, input_wh, num_classes):
 50 |         super(DetectionLayer, self).__init__()
 51 |         self.anchors = anchors
 52 |         self.input_wh = input_wh
 53 |         self.anchors_mask = anchors_mask
 54 |         self.num_classes = num_classes
 55 |     
 56 |     def forward(self, x):
 57 |         l_data, m_data, h_data = x
 58 |         l_grid_wh = (l_data.size(3), l_data.size(2))
 59 |         m_grid_wh = (m_data.size(3), m_data.size(2))
 60 |         h_grid_wh = (h_data.size(3), h_data.size(2))
 61 | 
 62 |         pred_l, stride_l = permute_sigmoid(l_data, self.input_wh, 3, self.num_classes)
 63 |         pred_m, stride_m = permute_sigmoid(m_data, self.input_wh, 3, self.num_classes)
 64 |         pred_h, stride_h = permute_sigmoid(h_data, self.input_wh, 3, self.num_classes)
 65 | 
 66 |         anchors1 = self.anchors[self.anchors_mask[0][0]: self.anchors_mask[0][-1]+1]
 67 |         anchors2 = self.anchors[self.anchors_mask[1][0]: self.anchors_mask[1][-1]+1]
 68 |         anchors3 = self.anchors[self.anchors_mask[2][0]: self.anchors_mask[2][-1]+1]
 69 |         
 70 |         decode_l = decode(pred_l.detach(), self.input_wh, anchors1, self.num_classes, stride_l)
 71 |         decode_m = decode(pred_m.detach(), self.input_wh, anchors2, self.num_classes, stride_m)
 72 |         decode_h = decode(pred_h.detach(), self.input_wh, anchors3, self.num_classes, stride_h)
 73 |         decode_pred = torch.cat((decode_l, decode_m, decode_h), 1)
 74 | 
 75 |         return decode_pred
 76 | 
 77 | def predict_conv_list1(num_classes):
 78 |     layers = list()
 79 |     layers += [ConvBN(1024, 512, kernel_size=1, stride=1, padding=0)]
 80 |     layers += [ConvBN(512, 1024, kernel_size=3, stride=1, padding=1)]
 81 |     layers += [ConvBN(1024, 512, kernel_size=1, stride=1, padding=0)]
 82 |     layers += [ConvBN(512, 1024, kernel_size=3, stride=1, padding=1)]
 83 |     layers += [ConvBN(1024, 512, kernel_size=1, stride=1, padding=0)]
 84 |     layers += [ConvBN(512, 1024, kernel_size=3, stride=1, padding=1)]
 85 |     layers += [nn.Conv2d(1024, (5 + num_classes) * 3, kernel_size=1, stride=1, padding=0)]
 86 |     return layers
 87 | 
 88 | def predict_conv_list2(num_classes):
 89 |     layers = list()
 90 |     layers += [ConvBN(768, 256, kernel_size=1, stride=1, padding=0)]
 91 |     layers += [ConvBN(256, 512, kernel_size=3, stride=1, padding=1)]
 92 |     layers += [ConvBN(512, 256, kernel_size=1, stride=1, padding=0)]
 93 |     layers += [ConvBN(256, 512, kernel_size=3, stride=1, padding=1)]
 94 |     layers += [ConvBN(512, 256, kernel_size=1, stride=1, padding=0)]
 95 |     layers += [ConvBN(256, 512, kernel_size=3, stride=1, padding=1)]
 96 |     layers += [nn.Conv2d(512, (5 + num_classes) * 3, kernel_size=1, stride=1, padding=0)]
 97 |     return layers
 98 | 
 99 | def predict_conv_list3(num_classes):
100 |     layers = list()
101 |     layers += [ConvBN(384, 128, kernel_size=1, stride=1, padding=0)]
102 |     layers += [ConvBN(128, 256, kernel_size=3, stride=1, padding=1)]
103 |     layers += [ConvBN(256, 128, kernel_size=1, stride=1, padding=0)]
104 |     layers += [ConvBN(128, 256, kernel_size=3, stride=1, padding=1)]
105 |     layers += [ConvBN(256, 128, kernel_size=1, stride=1, padding=0)]
106 |     layers += [ConvBN(128, 256, kernel_size=3, stride=1, padding=1)]
107 |     layers += [nn.Conv2d(256, (5 + num_classes) * 3, kernel_size=1, stride=1, padding=0)]
108 |     return layers
109 | 
110 | class YOLOv3(nn.Module):
111 |     def __init__(self, phase, num_blocks, anchors, anchors_mask, input_wh, num_classes):
112 |         super().__init__()
113 |         self.phase = phase
114 |         self.extractor = Darknet53(num_blocks)
115 |         self.predict_conv_list1 = nn.ModuleList(predict_conv_list1(num_classes))
116 |         self.smooth_conv1 = ConvBN(512, 256, kernel_size=1, stride=1, padding=0)
117 |         self.predict_conv_list2 = nn.ModuleList(predict_conv_list2(num_classes))
118 |         self.smooth_conv2 = ConvBN(256, 128, kernel_size=1, stride=1, padding=0)
119 |         self.predict_conv_list3 = nn.ModuleList(predict_conv_list3(num_classes))
120 |         if phase == "test":
121 |             self.detection = DetectionLayer(anchors, anchors_mask, input_wh, num_classes)
122 | 
123 |     def forward(self, x, targets=None):
124 |         c3, c4, c5 = self.extractor(x)
125 |         x = c5
126 |         # predict_list1
127 |         for i in range(5):
128 |             x = self.predict_conv_list1[i](x)
129 |         smt1 = self.smooth_conv1(x)
130 |         smt1 = F.upsample(smt1, scale_factor=2, mode='nearest')
131 | 
132 |         smt1 = torch.cat((smt1, c4), 1)
133 |         for i in range(5, 7):
134 |             x = self.predict_conv_list1[i](x)
135 |         out1 = x
136 | 
137 |         x = smt1
138 |         for i in range(5):
139 |             x = self.predict_conv_list2[i](x)
140 |         smt2 = self.smooth_conv2(x)
141 |         smt2 = F.upsample(smt2, scale_factor=2, mode='nearest')
142 |         smt2 = torch.cat((smt2, c3), 1)
143 |         for i in range(5, 7):
144 |             x = self.predict_conv_list2[i](x)
145 |         out2 = x
146 |         x = smt2
147 |         for i in range(7):
148 |             x = self.predict_conv_list3[i](x)
149 |         out3 = x
150 | 
151 |         if self.phase == "test":
152 |             detections = self.detection((out1, out2, out3))
153 |             return detections
154 |         elif self.phase == "train":
155 |             detections = (out1, out2, out3)
156 |             return detections
157 |         
158 |     def load_weights(self, base_file):
159 |         other, ext = os.path.splitext(base_file)
160 |         if ext == '.pkl' or '.pth':
161 |             print('Loading weights into state dict...')
162 |             self.extractor.load_state_dict(torch.load(base_file))
163 |             print("initing  darknet53 ......")
164 |             self.predict_conv_list1.apply(weights_init)
165 |             self.smooth_conv1.apply(weights_init)
166 |             self.predict_conv_list2.apply(weights_init)
167 |             self.smooth_conv2.apply(weights_init)
168 |             self.predict_conv_list3.apply(weights_init)
169 |             print('Finished!')
170 |         else:
171 |             print('Sorry only .pth and .pkl files supported.')
172 | 
173 | def Yolov3(phase, input_wh, anchors, anchors_mask, num_classes):
174 |     num_blocks = [1,2,8,8,4]
175 |     return YOLOv3(phase, num_blocks, anchors, anchors_mask, input_wh, num_classes)
176 | 


--------------------------------------------------------------------------------
/output/output_dog.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/output/output_dog.jpg


--------------------------------------------------------------------------------
/output/output_eagle.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/output/output_eagle.jpg


--------------------------------------------------------------------------------
/output/output_person.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/output/output_person.jpg


--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | # Written by yq_yao
  3 | 
  4 | import os
  5 | os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
  6 | import torch
  7 | import torch.nn as nn
  8 | import torch.optim as optim
  9 | import torch.backends.cudnn as cudnn
 10 | import torch.nn.init as init
 11 | import argparse
 12 | import torch.utils.data as data
 13 | from data.voc0712 import VOCDetection, detection_collate
 14 | from data.coco import COCODetection
 15 | from model.yolo import Yolov3
 16 | from data.config import voc_config, coco_config
 17 | from layers.yolo_loss import YoloLoss
 18 | from layers.multiyolo_loss import MultiYoloLoss
 19 | import numpy as np
 20 | import time
 21 | import os 
 22 | import sys
 23 | 
 24 | 
 25 | def arg_parse():
 26 |     """
 27 |     Parse arguments to the train module
 28 |     """
 29 |     parser = argparse.ArgumentParser(
 30 |         description='Yolov3 pytorch Training')
 31 |     parser.add_argument('-v', '--version', default='yolov3',
 32 |                         help='')
 33 |     parser.add_argument("--input_wh", dest = "input_wh", type=int, nargs=2, default = [416, 416])
 34 |     parser.add_argument('-d', '--dataset', default='VOC',
 35 |                         help='VOC or COCO dataset')
 36 |     parser.add_argument('-b', '--batch_size', default=64,
 37 |                         type=int, help='Batch size for training')
 38 |     parser.add_argument('--basenet', default='./weights/convert_darknet53.pth', help='pretrained base model')
 39 |     parser.add_argument('--ignore_thresh', default=0.5,
 40 |                         type=float, help='ignore_thresh')
 41 |     parser.add_argument('--subdivisions', default=4,
 42 |                         type=int, help='subdivisions for large batch_size')
 43 |     parser.add_argument('--num_workers', default=4,
 44 |                         type=int, help='Number of workers used in dataloading')
 45 |     parser.add_argument('--cuda', default=True,
 46 |                         type=bool, help='Use cuda to train model')
 47 |     parser.add_argument('--merge_yolo_loss', default=True,
 48 |                         type=bool, help='merge yolo loss')
 49 |     parser.add_argument('--lr', '--learning-rate',
 50 |                         default=1e-3, type=float, help='initial learning rate')
 51 |     parser.add_argument('--ngpu', default=2, type=int, help='gpus')
 52 | 
 53 |     parser.add_argument('--resume_net', default=None, 
 54 |                         help='resume net for retraining')
 55 |     parser.add_argument('--resume_epoch', default=0,
 56 |                         type=int, help='resume iter for retraining')
 57 |     parser.add_argument('-max','--max_epoch', default=200,
 58 |                         type=int, help='max epoch for retraining')
 59 |     parser.add_argument('--save_folder', default='./weights/',
 60 |                         help='Location to save checkpoint models')
 61 | 
 62 |     return parser.parse_args()
 63 | 
 64 | def adjust_learning_rate(optimizer, gamma, epoch, step_index, iteration, epoch_size):
 65 |     """Sets the learning rate 
 66 |     # Adapted from PyTorch Imagenet example:
 67 |     # https://github.com/pytorch/examples/blob/master/imagenet/main.py
 68 |     """
 69 |     if iteration < 1000:
 70 |         # warm up training
 71 |         lr = 0.001 * pow((iteration)/1000, 4)
 72 |     else:
 73 |         lr = args.lr * (gamma ** (step_index))
 74 |     for param_group in optimizer.param_groups:
 75 |         param_group['lr'] = lr
 76 |     return lr
 77 | 
 78 | 
 79 | if __name__ ==  '__main__':
 80 |     args = arg_parse()
 81 |     basenet = args.basenet
 82 |     save_folder = args.save_folder
 83 |     input_wh = args.input_wh
 84 |     batch_size = args.batch_size
 85 |     weight_decay = 0.0005
 86 |     gamma = 0.1
 87 |     momentum = 0.9
 88 |     cuda = args.cuda
 89 |     dataset_name = args.dataset
 90 |     subdivisions = args.subdivisions
 91 |     ignore_thresh = args.ignore_thresh
 92 |     merge_yolo_loss = args.merge_yolo_loss
 93 |     if not os.path.exists(save_folder):
 94 |         os.mkdir(save_folder)
 95 |     if cuda and torch.cuda.is_available():
 96 |         torch.set_default_tensor_type('torch.cuda.FloatTensor')
 97 |     else:
 98 |         torch.set_default_tensor_type('torch.FloatTensor')
 99 | 
100 |     # different datasets, include coco, voc0712 trainval, coco val
101 |     datasets_version = {"VOC": [('0712', '0712_trainval')],
102 |             "VOC0712++": [('0712', '0712_trainval_test')],
103 |             "VOC2012" : [('2012', '2012_trainval')],
104 |             "COCO": [('2014', 'train'), ('2014', 'valminusminival')],
105 |             "VOC2007": [('0712', "2007_test")],
106 |             "COCOval": [('2014', 'minival')]}
107 | 
108 |     print('Loading Dataset...')     
109 |     if dataset_name[0] == "V":
110 |         cfg = voc_config
111 |         train_dataset = VOCDetection(cfg["root"], datasets_version[dataset_name], input_wh, batch_size, cfg["multiscale"], dataset_name)
112 |     elif dataset_name[0] == "C":
113 |         cfg = coco_config
114 |         train_dataset = COCODetection(cfg["root"], datasets_version[dataset_name], input_wh, batch_size, cfg["multiscale"], dataset_name) 
115 |     else:
116 |         print('Unkown dataset!')
117 | 
118 |     # load Yolov3 net
119 |     net = Yolov3("train", input_wh, cfg["anchors"], cfg["anchors_mask"], cfg["num_classes"])
120 |     if args.resume_net == None:
121 |         net.load_weights(basenet)
122 |     else:
123 |         state_dict = torch.load(args.resume_net)
124 |         from collections import OrderedDict
125 |         new_state_dict = OrderedDict()
126 |         for k, v in state_dict.items():
127 |             head = k[:7]
128 |             if head == 'module.':
129 |                 name = k[7:] # remove `module.`
130 |             else:
131 |                 name = k
132 |             new_state_dict[name] = v
133 |         net.load_state_dict(new_state_dict)
134 |         print('Loading resume network...')
135 | 
136 |     if args.ngpu > 1:
137 |         net = torch.nn.DataParallel(net)
138 | 
139 |     if args.cuda:
140 |         net.cuda()
141 |         cudnn.benchmark = True
142 | 
143 |     optimizer = optim.SGD(net.parameters(), lr=args.lr,
144 |                       momentum=momentum, weight_decay=weight_decay)
145 | 
146 |     # load yolo loss
147 |     if merge_yolo_loss:
148 |         criterion = MultiYoloLoss(input_wh, cfg["num_classes"], ignore_thresh, cfg["anchors"], cfg["anchors_mask"])
149 |     else:
150 |         criterion = YoloLoss(input_wh, cfg["num_classes"], ignore_thresh, cfg["anchors"], cfg["anchors_mask"])
151 |     net.train()
152 |     ave_loss = -1
153 |     epoch = 0 + args.resume_epoch
154 |     mini_batch_size = int(batch_size / subdivisions)
155 | 
156 |     epoch_size = len(train_dataset) // (batch_size)
157 |     max_iter = args.max_epoch * epoch_size
158 | 
159 |     stepvalues_VOC = (160 * epoch_size, 180 * epoch_size, 201 * epoch_size)
160 |     stepvalues_COCO = (90 * epoch_size, 120 * epoch_size, 140 * epoch_size)
161 |     stepvalues = (stepvalues_VOC, stepvalues_COCO)[args.dataset=='COCO']
162 | 
163 |     print('Training', args.version, 'on', train_dataset.name)
164 |     step_index = 0
165 | 
166 |     if args.resume_epoch > 0:
167 |         start_iter = args.resume_epoch * epoch_size
168 |     else:
169 |         start_iter = 0
170 | 
171 |     lr = args.lr
172 | 
173 |     # begin to train
174 |     for iteration in range(start_iter, max_iter):
175 |         if iteration % epoch_size == 0:
176 |             batch_iterator = iter(data.DataLoader(train_dataset, 
177 |                                                 mini_batch_size,
178 |                                                 shuffle=False,
179 |                                                 num_workers=args.num_workers,
180 |                                                 collate_fn=detection_collate))
181 |             if (epoch % 5 == 0 and epoch > 0) or (epoch % 5 == 0 and epoch > 200):
182 |                 torch.save(net.state_dict(), args.save_folder+args.version+'_'+args.dataset + '_epoches_'+
183 |                            repr(epoch) + '.pth')
184 |             epoch += 1
185 | 
186 |         load_t0 = time.time()
187 |         if iteration in stepvalues:
188 |             step_index += 1
189 |         lr = adjust_learning_rate(optimizer, gamma, epoch, step_index, iteration, epoch_size)
190 |         debug = False
191 |         if iteration % 10 == 0:
192 |             debug = True
193 |         optimizer.zero_grad()
194 |         loss_sum = 0
195 |         for i in range(subdivisions):
196 |             images, targets = next(batch_iterator)
197 |             images.requires_grad_()
198 |             if args.cuda:
199 |                 images = images.cuda()
200 |                 with torch.no_grad():
201 |                     targets = [anno.cuda() for anno in targets]
202 |             else:
203 |                 images = images
204 |                 with torch.no_grad():
205 |                     targets = targets
206 |             # forward
207 |             resize_wh = images.size(3), images.size(2)
208 |             out = net(images)
209 |             loss = criterion(out, targets, resize_wh, debug) / subdivisions
210 |             loss.backward()
211 |             loss_sum += loss.item()
212 | 
213 |         if ave_loss < 0:
214 |             ave_loss = loss_sum
215 |         ave_loss = 0.1 * loss_sum + 0.9 * ave_loss
216 |         optimizer.step()
217 |         load_t1 = time.time()
218 |         if iteration % 10 == 0:
219 |             print('Epoch:' + repr(epoch) + ' || epochiter: ' + repr(iteration % epoch_size) + '/' + repr(epoch_size)
220 |                   + '|| Totel iter ' +
221 |                   repr(iteration) + ' Cur : %.4f  Ave : %.4f' % (loss_sum, ave_loss) + 
222 |                 ' iteration time: %.4f sec. ||' % (load_t1 - load_t0) + 'LR: %.5f' % (lr))
223 | 
224 |     torch.save(net.state_dict(), args.save_folder+args.version+'_'+args.dataset + "_final"+ '.pth')
225 | 
226 | 
227 | 
228 | 
229 | 
230 | 


--------------------------------------------------------------------------------
/utils/box_utils.py:
--------------------------------------------------------------------------------
  1 | from __future__ import division
  2 | 
  3 | import torch 
  4 | import torch.nn as nn
  5 | import torch.nn.functional as F 
  6 | from torch.autograd import Variable
  7 | import numpy as np
  8 | import math
  9 | import cv2
 10 | import time
 11 | from utils.nms_wrapper import nms
 12 | 
 13 | 
 14 | def get_rects(detection, input_wh, ori_wh, use_pad=False):
 15 |     if len(detection) > 0:
 16 |         if use_pad:
 17 |             scaling_factor = min(input_wh[0] / ori_wh[0], input_wh[1] / ori_wh[1])
 18 |             detection[:,[1,3]] -= (input_wh[0] - scaling_factor * ori_wh[0]) / 2
 19 |             detection[:,[2,4]] -= (input_wh[1] - scaling_factor * ori_wh[1]) / 2
 20 |             detection[:,1:5] /= scaling_factor
 21 |         else:
 22 |             detection[:,[1,3]] /= input_wh[0]
 23 |             detection[:,[2,4]] /= input_wh[1]
 24 |             detection[:, [1,3]] *= ori_wh[0]
 25 |             detection[:, [2,4]] *= ori_wh[1]
 26 |         for i in range(detection.shape[0]):
 27 |             detection[i, [1,3]] = torch.clamp(detection[i, [1,3]], 0.0, ori_wh[0])
 28 |             detection[i, [2,4]] = torch.clamp(detection[i, [2,4]], 0.0, ori_wh[1])
 29 |     return detection
 30 | 
 31 | def draw_rects(img, rects, classes):
 32 |     print(rects)
 33 |     for rect in rects:
 34 |         if rect[5] > 0.1:
 35 |             left_top = (int(rect[0]), int(rect[1]))
 36 |             right_bottom = (int(rect[2]), int(rect[3]))
 37 |             score = round(rect[4], 3)
 38 |             cls_id = int(rect[-1])
 39 |             label = "{0}".format(classes[cls_id])
 40 |             class_len = len(classes)
 41 |             offset = cls_id * 123457 % class_len
 42 |             red   = get_color(2, offset, class_len)
 43 |             green = get_color(1, offset, class_len)
 44 |             blue  = get_color(0, offset, class_len)        
 45 |             color = (blue, green, red)
 46 |             cv2.rectangle(img, left_top, right_bottom, color, 2)
 47 |             t_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_PLAIN, 1 , 1)[0]
 48 |             right_bottom = left_top[0] + t_size[0] + 3, left_top[1] - t_size[1] - 4
 49 |             cv2.rectangle(img, left_top, right_bottom, color, -1)
 50 |             cv2.putText(img, str(label)+str(score), (left_top[0], left_top[1] - t_size[1] - 4), cv2.FONT_HERSHEY_PLAIN, 1, [225,255,255], 1)
 51 |     return img        
 52 | 
 53 | def get_color(c, x, max_val):
 54 |     colors = torch.FloatTensor([[1,0,1],[0,0,1],[0,1,1],[0,1,0],[1,1,0],[1,0,0]])
 55 |     ratio = float(x) / max_val * 5
 56 |     i = int(math.floor(ratio))
 57 |     j = int(math.ceil(ratio))
 58 |     ratio = ratio - i
 59 |     r = (1-ratio) * colors[i][c] + ratio * colors[j][c]
 60 |     return int(r*255)
 61 | 
 62 | 
 63 | def unique(tensor):
 64 |     tensor_np = tensor.cpu().numpy()
 65 |     unique_np = np.unique(tensor_np)
 66 |     unique_tensor = torch.from_numpy(unique_np)
 67 |     
 68 |     tensor_res = tensor.new(unique_tensor.shape)
 69 |     tensor_res.copy_(unique_tensor)
 70 |     return tensor_res
 71 | 
 72 | def point_form(boxes):
 73 |     """ Convert prior_boxes to (xmin, ymin, xmax, ymax)
 74 |     representation for comparison to point form ground truth data.
 75 |     Args:
 76 |         boxes: (tensor) center-size default boxes from priorbox layers.
 77 |     Return:
 78 |         boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes.
 79 |     """
 80 |     return torch.cat((boxes[:, :2] - boxes[:, 2:]/2,     # xmin, ymin
 81 |                      boxes[:, :2] + boxes[:, 2:]/2), 1)  # xmax, ymax
 82 | 
 83 | def center_size(boxes):
 84 |     """ Convert prior_boxes to (cx, cy, w, h)
 85 |     representation for comparison to center-size form ground truth data.
 86 |     Args:
 87 |         boxes: (tensor) point_form boxes
 88 |     Return:
 89 |         boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes.
 90 |     """
 91 |     return torch.cat([(boxes[:, 2:] + boxes[:, :2])/2, boxes[:, 2:] - boxes[:, :2]], 1)  # w, h
 92 | 
 93 | 
 94 | def intersect(box_a, box_b):
 95 |     """ We resize both tensors to [A,B,2] without new malloc:
 96 |     [A,2] -> [A,1,2] -> [A,B,2]
 97 |     [B,2] -> [1,B,2] -> [A,B,2]
 98 |     Then we compute the area of intersect between box_a and box_b.
 99 |     Args:
100 |       box_a: (tensor) bounding boxes, Shape: [A,4].
101 |       box_b: (tensor) bounding boxes, Shape: [B,4].
102 |     Return:
103 |       (tensor) intersection area, Shape: [A,B].
104 |     """
105 |     # print(box_a)
106 |     A = box_a.size(0)
107 |     B = box_b.size(0)
108 |     max_xy = torch.min(box_a[:, 2:].unsqueeze(1).expand(A, B, 2),
109 |                        box_b[:, 2:].unsqueeze(0).expand(A, B, 2))
110 |     min_xy = torch.max(box_a[:, :2].unsqueeze(1).expand(A, B, 2),
111 |                        box_b[:, :2].unsqueeze(0).expand(A, B, 2))
112 |     inter = torch.clamp((max_xy - min_xy), min=0)
113 |     return inter[:, :, 0] * inter[:, :, 1]
114 | 
115 | 
116 | def jaccard(box_a, box_b):
117 |     """Compute the jaccard overlap of two sets of boxes.  The jaccard overlap
118 |     is simply the intersection over union of two boxes.  Here we operate on
119 |     ground truth boxes and default boxes.
120 |     E.g.:
121 |         A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B)
122 |     Args:
123 |         box_a: (tensor) Ground truth bounding boxes, Shape: [num_objects,4]
124 |         box_b: (tensor) Prior boxes from priorbox layers, Shape: [num_priors,4]
125 |     Return:
126 |         jaccard overlap: (tensor) Shape: [box_a.size(0), box_b.size(0)]
127 |     """
128 |     inter = intersect(box_a, box_b)
129 |     area_a = ((box_a[:, 2]-box_a[:, 0]) *
130 |               (box_a[:, 3]-box_a[:, 1])).unsqueeze(1).expand_as(inter)  # [A,B]
131 |     area_b = ((box_b[:, 2]-box_b[:, 0]) *
132 |               (box_b[:, 3]-box_b[:, 1])).unsqueeze(0).expand_as(inter)  # [A,B]
133 |     union = area_a + area_b - inter
134 |     return inter / union  # [A,B]
135 | 
136 | def trans_anchors(anchors):
137 |     new_anchors = torch.zeros((anchors.size(0), 4))
138 |     new_anchors[:, :2] += 2000
139 |     new_anchors[:, 2:] = anchors[:,]
140 |     return point_form(new_anchors)
141 | 
142 | def trans_truths(truths):
143 |     new_truths = torch.zeros((truths.size(0), 4))
144 |     new_truths[:, :2] += 2000
145 |     new_truths[:, 2:] = truths[:, 2:4]
146 |     return point_form(new_truths)
147 | 
148 | def int_index(anchors_mask, val):
149 |     for i in range(len(anchors_mask)):
150 |         if val == anchors_mask[i]:
151 |             return i
152 |     return -1
153 | 
154 | def encode_targets_all(input_wh, truths, labels, best_anchor_idx, anchors, feature_dim, num_pred, back_mask):
155 |     scale = torch.ones(num_pred).cuda()
156 |     encode_truths = torch.zeros((num_pred, 6)).cuda()
157 |     fore_mask = torch.zeros(num_pred).cuda()
158 |     # l_dim, m_dim, h_dim = feature_dim
159 |     l_grid_wh, m_grid_wh, h_grid_wh = feature_dim
160 |     for i in range(best_anchor_idx.size(0)):
161 |         index = 0
162 |         grid_wh = (0, 0)
163 |         # mask [0, 1, 2]
164 |         if best_anchor_idx[i].item() < 2.1:
165 |             grid_wh = l_grid_wh
166 |             index_begin = 0
167 |         # mask [3, 4, 5]
168 |         elif best_anchor_idx[i].item() < 5.1:
169 |             grid_wh = m_grid_wh
170 |             index_begin = l_grid_wh[0] * l_grid_wh[1] * 3
171 |         # mask [6, 7, 8]
172 |         else:
173 |             grid_wh = h_grid_wh
174 |             index_begin = (l_grid_wh[0]*l_grid_wh[1] + m_grid_wh[0]*m_grid_wh[1]) * 3
175 |         x = (truths[i][0] / input_wh[0]) * grid_wh[0]  
176 |         y = (truths[i][1] / input_wh[1]) * grid_wh[1]
177 |         floor_x, floor_y = math.floor(x), math.floor(y)
178 |         anchor_idx = best_anchor_idx[i].int().item() % 3
179 |         index = index_begin + floor_y * grid_wh[0] * 3 + floor_x * 3 + anchor_idx
180 | 
181 |         scale[index] = scale[index] + 1. - (truths[i][2] / input_wh[0]) * (truths[i][3] / input_wh[1])
182 | 
183 |         # encode targets x, y, w, h, objectness, class
184 |         truths[i][0] = x - floor_x
185 |         truths[i][1] = y - floor_y
186 |         truths[i][2] = torch.log(truths[i][2] / anchors[best_anchor_idx[i]][0] + 1e-8)
187 |         truths[i][3] = torch.log(truths[i][3] / anchors[best_anchor_idx[i]][1] + 1e-8)
188 |         encode_truths[index, :4] = truths[i]
189 |         encode_truths[index, 4] = 1.
190 |         encode_truths[index, 5] = labels[i].int().item()
191 | 
192 |         # set foreground mask to 1 and background mask to 0, because  pred should have unique target
193 |         fore_mask[index] = 1.
194 |         back_mask[index] = 0
195 | 
196 |     return encode_truths, fore_mask > 0, scale, back_mask
197 | 
198 | def encode_targets_single(input_wh, truths, labels, best_anchor_idx, anchors, anchors_mask, back_mask, grid_wh):
199 |     grid_w, grid_h = grid_wh[0], grid_wh[1]
200 |     num_pred = grid_w * grid_h * len(anchors_mask)
201 |     scale = torch.ones(num_pred).cuda()
202 |     encode_truths = torch.zeros((num_pred, 6)).cuda()
203 |     fore_mask = torch.zeros(num_pred).cuda()
204 | 
205 |     for i in range(best_anchor_idx.size(0)):
206 |         mask_n = int_index(anchors_mask, best_anchor_idx[i])
207 |         if mask_n < 0:
208 |             continue
209 |         x = (truths[i][0] / input_wh[0]) * grid_wh[0]  
210 |         y = (truths[i][1] / input_wh[1]) * grid_wh[1]
211 |         floor_x, floor_y = math.floor(x), math.floor(y)
212 |         index = floor_y * grid_wh[0] * 3 + floor_x * 3 + mask_n
213 |         scale[index] = scale[index] + 1. - (truths[i][2] / input_wh[0]) * (truths[i][3] / input_wh[1])
214 |         truths[i][0] = x - floor_x
215 |         truths[i][1] = y - floor_y
216 |         truths[i][2] = torch.log(truths[i][2] / anchors[best_anchor_idx[i]][0] + 1e-8)
217 |         truths[i][3] = torch.log(truths[i][3] / anchors[best_anchor_idx[i]][1] + 1e-8)
218 |         encode_truths[index, :4] = truths[i]
219 |         encode_truths[index, 4] = 1.
220 |         encode_truths[index, 5] = labels[i].int().item()
221 |         fore_mask[index] = 1.
222 |         back_mask[index] = 0
223 | 
224 |     return encode_truths, fore_mask > 0, scale, back_mask
225 | 
226 | def targets_match_single(input_wh, threshold, targets, pred, anchors, anchors_mask, pred_t, scale_t, fore_mask_t, back_mask_t, grid_wh, idx, cuda=True):
227 |     loc_truths = targets[:, :4].data
228 |     labels = targets[:,-1].data
229 |     overlaps = jaccard(
230 |         loc_truths, 
231 |         point_form(pred))
232 |     # (Bipartite Matching)
233 |     # [1,num_objects] best prior for each ground truth
234 |     # best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True)
235 |     # [1,num_priors] best ground truth for each prior
236 | 
237 |     best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True)
238 |     best_truth_idx.squeeze_(0)
239 |     best_truth_overlap.squeeze_(0)
240 |     back_mask = (best_truth_overlap - threshold) < 0
241 | 
242 |     anchors = torch.FloatTensor(anchors)    
243 |     if cuda:
244 |         anchors = anchors.cuda()
245 | 
246 |     center_truths = center_size(loc_truths)
247 | 
248 |     # convert anchor and truths to calculate iou
249 |     new_anchors = trans_anchors(anchors)
250 |     new_truths = trans_truths(center_truths)
251 |     overlaps_ = jaccard(
252 |         new_truths,
253 |         new_anchors)
254 |     best_anchor_overlap, best_anchor_idx = overlaps_.max(1, keepdim=True)
255 |     best_anchor_idx.squeeze_(1)
256 |     best_anchor_overlap.squeeze_(1)
257 | 
258 |     encode_truths, fore_mask, scale, back_mask = encode_targets_single(input_wh, center_truths, labels, best_anchor_idx, anchors, anchors_mask, back_mask, grid_wh)
259 | 
260 |     pred_t[idx] = encode_truths
261 |     scale_t[idx] = scale
262 |     fore_mask_t[idx] = fore_mask
263 |     back_mask_t[idx] = back_mask       
264 | 
265 | def targets_match_all(input_wh, threshold, targets, pred, anchors, feature_dim, pred_t, scale_t, fore_mask_t, back_mask_t, num_pred, idx, cuda=True):
266 |     loc_truths = targets[:, :4].data
267 |     labels = targets[:,-1].data
268 |     overlaps = jaccard(
269 |         loc_truths, 
270 |         point_form(pred))
271 |     # (Bipartite Matching)
272 |     # [1,num_objects] best prior for each ground truth
273 |     # best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True)
274 |     # [1,num_priors] best ground truth for each prior
275 | 
276 |     best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True)
277 |     best_truth_idx.squeeze_(0)
278 |     best_truth_overlap.squeeze_(0)
279 |     back_mask = (best_truth_overlap - threshold) < 0
280 | 
281 |     anchors = torch.FloatTensor(anchors)    
282 |     if cuda:
283 |         anchors = anchors.cuda()
284 | 
285 |     center_truths = center_size(loc_truths)
286 |     new_anchors = trans_anchors(anchors)
287 |     new_truths = trans_truths(center_truths)
288 |     overlaps_ = jaccard(
289 |         new_truths,
290 |         new_anchors)
291 |     best_anchor_overlap, best_anchor_idx = overlaps_.max(1, keepdim=True)
292 |     best_anchor_idx.squeeze_(1)
293 |     best_anchor_overlap.squeeze_(1)
294 | 
295 |     encode_truths, fore_mask, scale, back_mask = encode_targets_all(input_wh, center_truths, labels, best_anchor_idx, anchors, feature_dim, num_pred, back_mask)
296 | 
297 |     pred_t[idx] = encode_truths
298 |     scale_t[idx] = scale
299 |     fore_mask_t[idx] = fore_mask
300 |     back_mask_t[idx] = back_mask
301 | 
302 | def decode(prediction, input_wh, anchors, num_classes, stride_wh, cuda=True):
303 |     grid_wh = (input_wh[0] // stride_wh[0], input_wh[1] // stride_wh[1])
304 |     grid_w = np.arange(grid_wh[0])
305 |     grid_h = np.arange(grid_wh[1])
306 |     a,b = np.meshgrid(grid_w, grid_h)    
307 | 
308 |     num_anchors = len(anchors)
309 |     x_offset = torch.FloatTensor(a).view(-1,1)
310 |     y_offset = torch.FloatTensor(b).view(-1,1)
311 |     anchors = [(a[0]/stride_wh[0], a[1]/stride_wh[1]) for a in anchors]
312 |     if cuda:
313 |         x_offset = x_offset.cuda()
314 |         y_offset = y_offset.cuda()
315 |     x_y_offset = torch.cat((x_offset, y_offset), 1).repeat(1, num_anchors).view(-1,2).unsqueeze(0)
316 |     prediction[:,:,:2] += x_y_offset
317 |     anchors = torch.FloatTensor(anchors)    
318 |     if cuda:
319 |         anchors = anchors.cuda()
320 |     anchors = anchors.repeat(grid_wh[0]*grid_wh[1], 1).unsqueeze(0)
321 |     prediction[:,:,2:4] = torch.exp(prediction[:,:,2:4]) * anchors
322 |     prediction[:,:,0] *= stride_wh[0]
323 |     prediction[:,:,2] *= stride_wh[0]
324 |     prediction[:,:,1] *= stride_wh[1]
325 |     prediction[:,:,3] *= stride_wh[1]
326 |     return prediction
327 | 
328 | def permute_sigmoid(x, input_wh, num_anchors, num_classes):
329 |     batch_size = x.size(0)
330 |     grid_wh = (x.size(3), x.size(2))
331 |     input_w, input_h = input_wh
332 |     stride_wh = (input_w // grid_wh[0], input_h // grid_wh[1])
333 |     bbox_attrs = 5 + num_classes
334 |     x = x.view(batch_size, bbox_attrs*num_anchors, grid_wh[0] * grid_wh[1])
335 |     x = x.transpose(1,2).contiguous()
336 |     x = x.view(batch_size, grid_wh[0]*grid_wh[1]*num_anchors, bbox_attrs)
337 |     x[:,:,0] = torch.sigmoid(x[:,:,0])
338 |     x[:,:,1] = torch.sigmoid(x[:,:,1])             
339 |     x[:,:, 4 : bbox_attrs] = torch.sigmoid((x[:,:, 4 : bbox_attrs]))
340 |     return x, stride_wh
341 | 
342 | def detection_postprecess(detection, iou_thresh, num_classes, input_wh, ori_wh, use_pad=False, nms_conf=0.4):
343 |     assert detection.size(0) == 1, "only support batch_size == 1"
344 |     conf_mask = (detection[:,:,4] > iou_thresh).float().unsqueeze(2)
345 |     detection = detection * conf_mask
346 |     try:
347 |         ind_nz = torch.nonzero(detection[:,:,4]).transpose(0,1).contiguous()
348 |     except:
349 |         print("detect no results")
350 |         return np.empty([0, 5], dtype=np.float32)
351 |     bbox_pred = point_form(detection[:, :, :4].view(-1, 4))
352 |     conf_pred = detection[:, :, 4].view(-1, 1)
353 |     cls_pred = detection[:, :, 5:].view(-1, num_classes)
354 | 
355 |     max_conf, max_conf_idx = torch.max(cls_pred, 1) 
356 | 
357 |     max_conf = max_conf.float().unsqueeze(1)
358 |     max_conf_idx = max_conf_idx.float().unsqueeze(1)
359 | 
360 |     # score = (conf_pred * max_conf).view(-1, 1)
361 |     score = conf_pred
362 |     image_pred = torch.cat((bbox_pred, score, max_conf, max_conf_idx), 1)
363 | 
364 |     non_zero_ind =  (torch.nonzero(image_pred[:,4]))
365 |     image_pred_ = image_pred[non_zero_ind.squeeze(),:].view(-1, 7)
366 |     try:
367 |         img_classes = unique(image_pred_[:,-1])
368 |     except:
369 |         print("no class find")
370 |         return np.empty([0, 7], dtype=np.float32)
371 |     flag = False
372 |     out_out = None
373 |     for cls in img_classes:
374 |         cls_mask = image_pred_*(image_pred_[:,-1] == cls).float().unsqueeze(1)
375 |         class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze()
376 |         
377 |         image_pred_class = image_pred_[class_mask_ind].view(-1,7)
378 |         keep = nms(image_pred_class.cpu().numpy(), nms_conf, force_cpu=True)
379 |         image_pred_class = image_pred_class[keep]
380 |         if not flag:
381 |             out_put = image_pred_class
382 |             flag = True
383 |         else:
384 |             out_put = torch.cat((out_put, image_pred_class), 0)
385 | 
386 | 
387 |     image_pred_class = out_put
388 |     if use_pad:
389 |         scaling_factor = min(input_wh[0] / ori_wh[0], input_wh[1] / ori_wh[1])
390 |         image_pred_class[:,[0,2]] -= (input_wh[0] - scaling_factor * ori_wh[0]) / 2
391 |         image_pred_class[:,[1,3]] -= (input_wh[1] - scaling_factor * ori_wh[1]) / 2
392 |         image_pred_class[:,:4] /= scaling_factor
393 |     else:
394 |         image_pred_class[:,[0,2]] /= input_wh[0]
395 |         image_pred_class[:,[1,3]] /= input_wh[1]
396 |         image_pred_class[:, [0,2]] *= ori_wh[0]
397 |         image_pred_class[:, [1,3]] *= ori_wh[1]
398 | 
399 |     for i in range(image_pred_class.shape[0]):
400 |         image_pred_class[i, [0,2]] = torch.clamp(image_pred_class[i, [0,2]], 0.0, ori_wh[0])
401 |         image_pred_class[i, [1,3]] = torch.clamp(image_pred_class[i, [1,3]], 0.0, ori_wh[1])
402 |     return image_pred_class.cpu().numpy()
403 | 
404 | 


--------------------------------------------------------------------------------
/utils/build.py:
--------------------------------------------------------------------------------
  1 | # --------------------------------------------------------
  2 | # Fast R-CNN
  3 | # Copyright (c) 2015 Microsoft
  4 | # Licensed under The MIT License [see LICENSE for details]
  5 | # Written by Ross Girshick
  6 | # --------------------------------------------------------
  7 | 
  8 | import os
  9 | from os.path import join as pjoin
 10 | import numpy as np
 11 | from distutils.core import setup
 12 | from distutils.extension import Extension
 13 | from Cython.Distutils import build_ext
 14 | 
 15 | 
 16 | def find_in_path(name, path):
 17 |     "Find a file in a search path"
 18 |     # adapted fom http://code.activestate.com/recipes/52224-find-a-file-given-a-search-path/
 19 |     for dir in path.split(os.pathsep):
 20 |         binpath = pjoin(dir, name)
 21 |         if os.path.exists(binpath):
 22 |             return os.path.abspath(binpath)
 23 |     return None
 24 | 
 25 | 
 26 | def locate_cuda():
 27 |     """Locate the CUDA environment on the system
 28 | 
 29 |     Returns a dict with keys 'home', 'nvcc', 'include', and 'lib64'
 30 |     and values giving the absolute path to each directory.
 31 | 
 32 |     Starts by looking for the CUDAHOME env variable. If not found, everything
 33 |     is based on finding 'nvcc' in the PATH.
 34 |     """
 35 | 
 36 |     # first check if the CUDAHOME env variable is in use
 37 |     if 'CUDAHOME' in os.environ:
 38 |         home = os.environ['CUDAHOME']
 39 |         nvcc = pjoin(home, 'bin', 'nvcc')
 40 |     else:
 41 |         # otherwise, search the PATH for NVCC
 42 |         default_path = pjoin(os.sep, 'usr', 'local', 'cuda', 'bin')
 43 |         nvcc = find_in_path('nvcc', os.environ['PATH'] + os.pathsep + default_path)
 44 |         if nvcc is None:
 45 |             raise EnvironmentError('The nvcc binary could not be '
 46 |                                    'located in your $PATH. Either add it to your path, or set $CUDAHOME')
 47 |         home = os.path.dirname(os.path.dirname(nvcc))
 48 | 
 49 |     cudaconfig = {'home': home, 'nvcc': nvcc,
 50 |                   'include': pjoin(home, 'include'),
 51 |                   'lib64': pjoin(home, 'lib64')}
 52 |     for k, v in cudaconfig.items():
 53 |         if not os.path.exists(v):
 54 |             raise EnvironmentError('The CUDA %s path could not be located in %s' % (k, v))
 55 | 
 56 |     return cudaconfig
 57 | 
 58 | 
 59 | CUDA = locate_cuda()
 60 | 
 61 | # Obtain the numpy include directory.  This logic works across numpy versions.
 62 | try:
 63 |     numpy_include = np.get_include()
 64 | except AttributeError:
 65 |     numpy_include = np.get_numpy_include()
 66 | 
 67 | 
 68 | def customize_compiler_for_nvcc(self):
 69 |     """inject deep into distutils to customize how the dispatch
 70 |     to gcc/nvcc works.
 71 | 
 72 |     If you subclass UnixCCompiler, it's not trivial to get your subclass
 73 |     injected in, and still have the right customizations (i.e.
 74 |     distutils.sysconfig.customize_compiler) run on it. So instead of going
 75 |     the OO route, I have this. Note, it's kindof like a wierd functional
 76 |     subclassing going on."""
 77 | 
 78 |     # tell the compiler it can processes .cu
 79 |     self.src_extensions.append('.cu')
 80 | 
 81 |     # save references to the default compiler_so and _comple methods
 82 |     default_compiler_so = self.compiler_so
 83 |     super = self._compile
 84 | 
 85 |     # now redefine the _compile method. This gets executed for each
 86 |     # object but distutils doesn't have the ability to change compilers
 87 |     # based on source extension: we add it.
 88 |     def _compile(obj, src, ext, cc_args, extra_postargs, pp_opts):
 89 |         print(extra_postargs)
 90 |         if os.path.splitext(src)[1] == '.cu':
 91 |             # use the cuda for .cu files
 92 |             self.set_executable('compiler_so', CUDA['nvcc'])
 93 |             # use only a subset of the extra_postargs, which are 1-1 translated
 94 |             # from the extra_compile_args in the Extension class
 95 |             postargs = extra_postargs['nvcc']
 96 |         else:
 97 |             postargs = extra_postargs['gcc']
 98 | 
 99 |         super(obj, src, ext, cc_args, postargs, pp_opts)
100 |         # reset the default compiler_so, which we might have changed for cuda
101 |         self.compiler_so = default_compiler_so
102 | 
103 |     # inject our redefined _compile method into the class
104 |     self._compile = _compile
105 | 
106 | 
107 | # run the customize_compiler
108 | class custom_build_ext(build_ext):
109 |     def build_extensions(self):
110 |         customize_compiler_for_nvcc(self.compiler)
111 |         build_ext.build_extensions(self)
112 | 
113 | 
114 | ext_modules = [
115 |     Extension(
116 |         "nms.cpu_nms",
117 |         ["nms/cpu_nms.pyx"],
118 |         extra_compile_args={'gcc': ["-Wno-cpp", "-Wno-unused-function"]},
119 |         include_dirs=[numpy_include]
120 |     ),
121 |     Extension('nms.gpu_nms',
122 |               ['nms/nms_kernel.cu', 'nms/gpu_nms.pyx'],
123 |               library_dirs=[CUDA['lib64']],
124 |               libraries=['cudart'],
125 |               language='c++',
126 |               runtime_library_dirs=[CUDA['lib64']],
127 |               # this syntax is specific to this build system
128 |               # we're only going to use certain compiler args with nvcc and not with gcc
129 |               # the implementation of this trick is in customize_compiler() below
130 |               extra_compile_args={'gcc': ["-Wno-unused-function"],
131 |                                   'nvcc': ['-arch=sm_61',
132 |                                            '--ptxas-options=-v',
133 |                                            '-c',
134 |                                            '--compiler-options',
135 |                                            "'-fPIC'"]},
136 |               include_dirs=[numpy_include, CUDA['include']]
137 |               ),
138 |     # Extension(
139 |     #     'pycocotools._mask',
140 |     #     sources=['pycocotools/maskApi.c', 'pycocotools/_mask.pyx'],
141 |     #     include_dirs=[numpy_include, 'pycocotools'],
142 |     #     extra_compile_args={
143 |     #         'gcc': ['-Wno-cpp', '-Wno-unused-function', '-std=c99']},
144 |     # ),
145 | ]
146 | 
147 | setup(
148 |     name='mot_utils',
149 |     ext_modules=ext_modules,
150 |     # inject our custom trigger
151 |     cmdclass={'build_ext': custom_build_ext},
152 | )
153 | 


--------------------------------------------------------------------------------
/utils/gen_anchors.py:
--------------------------------------------------------------------------------
  1 | import random
  2 | import argparse
  3 | import numpy as np
  4 | import os
  5 | import sys
  6 | if sys.version_info[0] == 2:
  7 |     import xml.etree.cElementTree as ET
  8 | else:
  9 |     import xml.etree.ElementTree as ET
 10 | import pickle
 11 | 
 12 | import json
 13 | 
 14 | def parse_voc_annotation(ann_dir, img_dir, train_val_list, cache_name, labels=[]):
 15 |     if os.path.exists(cache_name):
 16 |         with open(cache_name, 'rb') as handle:
 17 |             cache = pickle.load(handle)
 18 |         all_insts, seen_labels = cache['all_insts'], cache['seen_labels']
 19 |     else:
 20 |         all_insts = []
 21 |         seen_labels = {}
 22 |         
 23 |         for ann in sorted(train_val_list):
 24 |             img = {'object':[]}
 25 | 
 26 |             try:
 27 |                 tree = ET.parse(os.path.join(ann_dir, ann + ".xml"))
 28 |             except Exception as e:
 29 |                 print(e)
 30 |                 print('Ignore this bad annotation: ' + ann_dir + ann)
 31 |                 continue
 32 |             
 33 |             for elem in tree.iter():
 34 |                 if 'filename' in elem.tag:
 35 |                     img['filename'] = os.path.join(img_dir, elem.text + ".jpg")
 36 |                 if 'width' in elem.tag:
 37 |                     img['width'] = int(elem.text)
 38 |                 if 'height' in elem.tag:
 39 |                     img['height'] = int(elem.text)
 40 |                 if 'object' in elem.tag or 'part' in elem.tag:
 41 |                     obj = {}
 42 |                     
 43 |                     for attr in list(elem):
 44 |                         if 'name' in attr.tag:
 45 |                             obj['name'] = attr.text
 46 | 
 47 |                             if obj['name'] in seen_labels:
 48 |                                 seen_labels[obj['name']] += 1
 49 |                             else:
 50 |                                 seen_labels[obj['name']] = 1
 51 |                             
 52 |                             if len(labels) > 0 and obj['name'] not in labels:
 53 |                                 break
 54 |                             else:
 55 |                                 img['object'] += [obj]
 56 |                                 
 57 |                         if 'bndbox' in attr.tag:
 58 |                             for dim in list(attr):
 59 |                                 if 'xmin' in dim.tag:
 60 |                                     obj['xmin'] = int(round(float(dim.text)))
 61 |                                 if 'ymin' in dim.tag:
 62 |                                     obj['ymin'] = int(round(float(dim.text)))
 63 |                                 if 'xmax' in dim.tag:
 64 |                                     obj['xmax'] = int(round(float(dim.text)))
 65 |                                 if 'ymax' in dim.tag:
 66 |                                     obj['ymax'] = int(round(float(dim.text)))
 67 | 
 68 |             if len(img['object']) > 0:
 69 |                 all_insts += [img]
 70 | 
 71 |         cache = {'all_insts': all_insts, 'seen_labels': seen_labels}
 72 |         with open(cache_name, 'wb') as handle:
 73 |             pickle.dump(cache, handle, protocol=pickle.HIGHEST_PROTOCOL)    
 74 |                         
 75 |     return all_insts, seen_labels
 76 | 
 77 | def IOU(ann, centroids):
 78 |     w, h = ann
 79 |     similarities = []
 80 | 
 81 |     for centroid in centroids:
 82 |         c_w, c_h = centroid
 83 | 
 84 |         if c_w >= w and c_h >= h:
 85 |             similarity = w*h/(c_w*c_h)
 86 |         elif c_w >= w and c_h <= h:
 87 |             similarity = w*c_h/(w*h + (c_w-w)*c_h)
 88 |         elif c_w <= w and c_h >= h:
 89 |             similarity = c_w*h/(w*h + c_w*(c_h-h))
 90 |         else: #means both w,h are bigger than c_w and c_h respectively
 91 |             similarity = (c_w*c_h)/(w*h)
 92 |         similarities.append(similarity) # will become (k,) shape
 93 | 
 94 |     return np.array(similarities)
 95 | 
 96 | def avg_IOU(anns, centroids):
 97 |     n,d = anns.shape
 98 |     sum = 0.
 99 | 
100 |     for i in range(anns.shape[0]):
101 |         sum+= max(IOU(anns[i], centroids))
102 | 
103 |     return sum/n
104 | 
105 | def print_anchors(centroids):
106 |     out_string = ''
107 | 
108 |     anchors = centroids.copy()
109 | 
110 |     widths = anchors[:, 0]
111 |     sorted_indices = np.argsort(widths)
112 | 
113 |     r = "anchors: ["
114 |     for i in sorted_indices:
115 |         out_string += str(int(anchors[i,0]*416)) + ',' + str(int(anchors[i,1]*416)) + ', '
116 |             
117 |     print(out_string[:-2])
118 | 
119 | def run_kmeans(ann_dims, anchor_num):
120 |     ann_num = ann_dims.shape[0]
121 |     iterations = 0
122 |     prev_assignments = np.ones(ann_num)*(-1)
123 |     iteration = 0
124 |     old_distances = np.zeros((ann_num, anchor_num))
125 | 
126 |     indices = [random.randrange(ann_dims.shape[0]) for i in range(anchor_num)]
127 |     centroids = ann_dims[indices]
128 |     anchor_dim = ann_dims.shape[1]
129 | 
130 |     while True:
131 |         distances = []
132 |         iteration += 1
133 |         for i in range(ann_num):
134 |             d = 1 - IOU(ann_dims[i], centroids)
135 |             distances.append(d)
136 |         distances = np.array(distances) # distances.shape = (ann_num, anchor_num)
137 | 
138 |         print("iteration {}: dists = {}".format(iteration, np.sum(np.abs(old_distances-distances))))
139 | 
140 |         #assign samples to centroids
141 |         assignments = np.argmin(distances,axis=1)
142 | 
143 |         if (assignments == prev_assignments).all() :
144 |             return centroids
145 | 
146 |         #calculate new centroids
147 |         centroid_sums=np.zeros((anchor_num, anchor_dim), np.float)
148 |         for i in range(ann_num):
149 |             centroid_sums[assignments[i]]+=ann_dims[i]
150 |         for j in range(anchor_num):
151 |             centroids[j] = centroid_sums[j]/(np.sum(assignments==j) + 1e-6)
152 | 
153 |         prev_assignments = assignments.copy()
154 |         old_distances = distances.copy()
155 | 
156 | def _main_(argv):
157 |     num_anchors = args.anchors
158 |     train_annot_folder = "/localSSD/yyq/VOCdevkit0712/VOC0712/Annotations/"
159 |     train_image_folder = "/localSSD/yyq/VOCdevkit0712/VOC0712/JPEGImages/"
160 |     train_val_txt = "/localSSD/yyq/VOCdevkit0712/VOC0712/ImageSets/Main/0712_trainval_test.txt"
161 |     with open(train_val_txt, "r") as f:
162 |         train_val_list = [i.strip() for i in f.readlines()]
163 |     cache_name = "voc_train.pkl"
164 |     labels = ["aeroplane", "bicycle", "bird", "boat",
165 |              "bottle", "bus", "car", "cat", "chair", 
166 |              "cow", "diningtable", "dog", "horse", 
167 |              "motorbike", "person", "pottedplant",
168 |             "sheep", "sofa", "train", "tvmonitor"]
169 | 
170 |     train_imgs, train_labels = parse_voc_annotation(
171 |         train_annot_folder,
172 |         train_image_folder,
173 |         train_val_list,
174 |         cache_name,
175 |         labels
176 |     )
177 | 
178 |     # run k_mean to find the anchors
179 |     annotation_dims = []
180 |     for image in train_imgs:
181 |         # print(image['filename'])
182 |         for obj in image['object']:
183 |             relative_w = (float(obj['xmax']) - float(obj['xmin']))/image['width']
184 |             relatice_h = (float(obj["ymax"]) - float(obj['ymin']))/image['height']
185 |             annotation_dims.append(tuple(map(float, (relative_w,relatice_h))))
186 | 
187 |     annotation_dims = np.array(annotation_dims)
188 |     centroids = run_kmeans(annotation_dims, num_anchors)
189 | 
190 |     # write anchors to file
191 |     print('\naverage IOU for', num_anchors, 'anchors:', '%0.2f' % avg_IOU(annotation_dims, centroids))
192 |     print_anchors(centroids)
193 | 
194 | if __name__ == '__main__':
195 |     argparser = argparse.ArgumentParser()
196 | 
197 |     argparser.add_argument(
198 |         '-c',
199 |         '--conf',
200 |         default='config.json',
201 |         help='path to configuration file')
202 |     argparser.add_argument(
203 |         '-a',
204 |         '--anchors',
205 |         default=9,
206 |         help='number of anchors to use')
207 | 
208 |     args = argparser.parse_args()
209 |     _main_(args)


--------------------------------------------------------------------------------
/utils/nms/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/utils/nms/__init__.py


--------------------------------------------------------------------------------
/utils/nms/cpu_nms.pyx:
--------------------------------------------------------------------------------
  1 | # --------------------------------------------------------
  2 | # Fast R-CNN
  3 | # Copyright (c) 2015 Microsoft
  4 | # Licensed under The MIT License [see LICENSE for details]
  5 | # Written by Ross Girshick
  6 | # --------------------------------------------------------
  7 | 
  8 | import numpy as np
  9 | cimport numpy as np
 10 | 
 11 | cdef inline np.float32_t max(np.float32_t a, np.float32_t b):
 12 |     return a if a >= b else b
 13 | 
 14 | cdef inline np.float32_t min(np.float32_t a, np.float32_t b):
 15 |     return a if a <= b else b
 16 | 
 17 | cdef inline np.float32_t abs(np.float32_t a, np.float32_t b):
 18 |     return a - b if a >= b else b - a
 19 | 
 20 | def cpu_nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh):
 21 |     cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:, 0]
 22 |     cdef np.ndarray[np.float32_t, ndim=1] y1 = dets[:, 1]
 23 |     cdef np.ndarray[np.float32_t, ndim=1] x2 = dets[:, 2]
 24 |     cdef np.ndarray[np.float32_t, ndim=1] y2 = dets[:, 3]
 25 |     cdef np.ndarray[np.float32_t, ndim=1] scores = dets[:, 4]
 26 | 
 27 |     cdef np.ndarray[np.float32_t, ndim=1] areas = (x2 - x1 + 1) * (y2 - y1 + 1)
 28 |     cdef np.ndarray[np.int_t, ndim=1] order = scores.argsort()[::-1]
 29 | 
 30 |     cdef int ndets = dets.shape[0]
 31 |     cdef np.ndarray[np.int_t, ndim=1] suppressed = \
 32 |             np.zeros((ndets), dtype=np.int)
 33 | 
 34 |     # nominal indices
 35 |     cdef int _i, _j
 36 |     # sorted indices
 37 |     cdef int i, j
 38 |     # temp variables for box i's (the box currently under consideration)
 39 |     cdef np.float32_t ix1, iy1, ix2, iy2, iarea
 40 |     # variables for computing overlap with box j (lower scoring box)
 41 |     cdef np.float32_t xx1, yy1, xx2, yy2
 42 |     cdef np.float32_t w, h
 43 |     cdef np.float32_t inter, ovr
 44 | 
 45 |     keep = []
 46 |     for _i in range(ndets):
 47 |         i = order[_i]
 48 |         if suppressed[i] == 1:
 49 |             continue
 50 |         keep.append(i)
 51 |         ix1 = x1[i]
 52 |         iy1 = y1[i]
 53 |         ix2 = x2[i]
 54 |         iy2 = y2[i]
 55 |         iarea = areas[i]
 56 |         for _j in range(_i + 1, ndets):
 57 |             j = order[_j]
 58 |             if suppressed[j] == 1:
 59 |                 continue
 60 |             xx1 = max(ix1, x1[j])
 61 |             yy1 = max(iy1, y1[j])
 62 |             xx2 = min(ix2, x2[j])
 63 |             yy2 = min(iy2, y2[j])
 64 |             w = max(0.0, xx2 - xx1 + 1)
 65 |             h = max(0.0, yy2 - yy1 + 1)
 66 |             inter = w * h
 67 |             ovr = inter / (iarea + areas[j] - inter)
 68 |             if ovr >= thresh:
 69 |                 suppressed[j] = 1
 70 | 
 71 |     return keep
 72 | 
 73 | def cpu_soft_nms(np.ndarray[float, ndim=2] boxes, float sigma=0.5, float Nt=0.3, float threshold=0.001, unsigned int method=0):
 74 |     cdef unsigned int N = boxes.shape[0]
 75 |     cdef float iw, ih, box_area
 76 |     cdef float ua
 77 |     cdef int pos = 0
 78 |     cdef float maxscore = 0
 79 |     cdef int maxpos = 0
 80 |     cdef float x1,x2,y1,y2,tx1,tx2,ty1,ty2,ts,area,weight,ov
 81 | 
 82 |     for i in range(N):
 83 |         maxscore = boxes[i, 4]
 84 |         maxpos = i
 85 | 
 86 |         tx1 = boxes[i,0]
 87 |         ty1 = boxes[i,1]
 88 |         tx2 = boxes[i,2]
 89 |         ty2 = boxes[i,3]
 90 |         ts = boxes[i,4]
 91 | 
 92 |         pos = i + 1
 93 | 	# get max box
 94 |         while pos < N:
 95 |             if maxscore < boxes[pos, 4]:
 96 |                 maxscore = boxes[pos, 4]
 97 |                 maxpos = pos
 98 |             pos = pos + 1
 99 | 
100 | 	# add max box as a detection 
101 |         boxes[i,0] = boxes[maxpos,0]
102 |         boxes[i,1] = boxes[maxpos,1]
103 |         boxes[i,2] = boxes[maxpos,2]
104 |         boxes[i,3] = boxes[maxpos,3]
105 |         boxes[i,4] = boxes[maxpos,4]
106 | 
107 | 	# swap ith box with position of max box
108 |         boxes[maxpos,0] = tx1
109 |         boxes[maxpos,1] = ty1
110 |         boxes[maxpos,2] = tx2
111 |         boxes[maxpos,3] = ty2
112 |         boxes[maxpos,4] = ts
113 | 
114 |         tx1 = boxes[i,0]
115 |         ty1 = boxes[i,1]
116 |         tx2 = boxes[i,2]
117 |         ty2 = boxes[i,3]
118 |         ts = boxes[i,4]
119 | 
120 |         pos = i + 1
121 | 	# NMS iterations, note that N changes if detection boxes fall below threshold
122 |         while pos < N:
123 |             x1 = boxes[pos, 0]
124 |             y1 = boxes[pos, 1]
125 |             x2 = boxes[pos, 2]
126 |             y2 = boxes[pos, 3]
127 |             s = boxes[pos, 4]
128 | 
129 |             area = (x2 - x1 + 1) * (y2 - y1 + 1)
130 |             iw = (min(tx2, x2) - max(tx1, x1) + 1)
131 |             if iw > 0:
132 |                 ih = (min(ty2, y2) - max(ty1, y1) + 1)
133 |                 if ih > 0:
134 |                     ua = float((tx2 - tx1 + 1) * (ty2 - ty1 + 1) + area - iw * ih)
135 |                     ov = iw * ih / ua #iou between max box and detection box
136 | 
137 |                     if method == 1: # linear
138 |                         if ov > Nt: 
139 |                             weight = 1 - ov
140 |                         else:
141 |                             weight = 1
142 |                     elif method == 2: # gaussian
143 |                         weight = np.exp(-(ov * ov)/sigma)
144 |                     else: # original NMS
145 |                         if ov > Nt: 
146 |                             weight = 0
147 |                         else:
148 |                             weight = 1
149 | 
150 |                     boxes[pos, 4] = weight*boxes[pos, 4]
151 | 		    
152 | 		    # if box score falls below threshold, discard the box by swapping with last box
153 | 		    # update N
154 |                     if boxes[pos, 4] < threshold:
155 |                         boxes[pos,0] = boxes[N-1, 0]
156 |                         boxes[pos,1] = boxes[N-1, 1]
157 |                         boxes[pos,2] = boxes[N-1, 2]
158 |                         boxes[pos,3] = boxes[N-1, 3]
159 |                         boxes[pos,4] = boxes[N-1, 4]
160 |                         N = N - 1
161 |                         pos = pos - 1
162 | 
163 |             pos = pos + 1
164 | 
165 |     keep = [i for i in range(N)]
166 |     return keep
167 | 


--------------------------------------------------------------------------------
/utils/nms/gpu_nms.hpp:
--------------------------------------------------------------------------------
1 | void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num,
2 |           int boxes_dim, float nms_overlap_thresh, int device_id);
3 | 


--------------------------------------------------------------------------------
/utils/nms/gpu_nms.pyx:
--------------------------------------------------------------------------------
 1 | # --------------------------------------------------------
 2 | # Faster R-CNN
 3 | # Copyright (c) 2015 Microsoft
 4 | # Licensed under The MIT License [see LICENSE for details]
 5 | # Written by Ross Girshick
 6 | # --------------------------------------------------------
 7 | 
 8 | import numpy as np
 9 | cimport numpy as np
10 | 
11 | assert sizeof(int) == sizeof(np.int32_t)
12 | 
13 | cdef extern from "gpu_nms.hpp":
14 |     void _nms(np.int32_t*, int*, np.float32_t*, int, int, float, int)
15 | 
16 | def gpu_nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh,
17 |             np.int32_t device_id=0):
18 |     cdef int boxes_num = dets.shape[0]
19 |     cdef int boxes_dim = dets.shape[1]
20 |     cdef int num_out
21 |     cdef np.ndarray[np.int32_t, ndim=1] \
22 |         keep = np.zeros(boxes_num, dtype=np.int32)
23 |     cdef np.ndarray[np.float32_t, ndim=1] \
24 |         scores = dets[:, 4]
25 |     cdef np.ndarray[np.int_t, ndim=1] \
26 |         order = scores.argsort()[::-1]
27 |     cdef np.ndarray[np.float32_t, ndim=2] \
28 |         sorted_dets = dets[order, :]
29 |     _nms(&keep[0], &num_out, &sorted_dets[0, 0], boxes_num, boxes_dim, thresh, device_id)
30 |     keep = keep[:num_out]
31 |     return list(order[keep])
32 | 


--------------------------------------------------------------------------------
/utils/nms/nms_kernel.cu:
--------------------------------------------------------------------------------
  1 | // ------------------------------------------------------------------
  2 | // Faster R-CNN
  3 | // Copyright (c) 2015 Microsoft
  4 | // Licensed under The MIT License [see fast-rcnn/LICENSE for details]
  5 | // Written by Shaoqing Ren
  6 | // ------------------------------------------------------------------
  7 | 
  8 | #include "gpu_nms.hpp"
  9 | #include <vector>
 10 | #include <iostream>
 11 | 
 12 | #define CUDA_CHECK(condition) \
 13 |   /* Code block avoids redefinition of cudaError_t error */ \
 14 |   do { \
 15 |     cudaError_t error = condition; \
 16 |     if (error != cudaSuccess) { \
 17 |       std::cout << cudaGetErrorString(error) << std::endl; \
 18 |     } \
 19 |   } while (0)
 20 | 
 21 | #define DIVUP(m,n) ((m) / (n) + ((m) % (n) > 0))
 22 | int const threadsPerBlock = sizeof(unsigned long long) * 8;
 23 | 
 24 | __device__ inline float devIoU(float const * const a, float const * const b) {
 25 |   float left = max(a[0], b[0]), right = min(a[2], b[2]);
 26 |   float top = max(a[1], b[1]), bottom = min(a[3], b[3]);
 27 |   float width = max(right - left + 1, 0.f), height = max(bottom - top + 1, 0.f);
 28 |   float interS = width * height;
 29 |   float Sa = (a[2] - a[0] + 1) * (a[3] - a[1] + 1);
 30 |   float Sb = (b[2] - b[0] + 1) * (b[3] - b[1] + 1);
 31 |   return interS / (Sa + Sb - interS);
 32 | }
 33 | 
 34 | __global__ void nms_kernel(const int n_boxes, const float nms_overlap_thresh,
 35 |                            const float *dev_boxes, unsigned long long *dev_mask) {
 36 |   const int row_start = blockIdx.y;
 37 |   const int col_start = blockIdx.x;
 38 | 
 39 |   // if (row_start > col_start) return;
 40 | 
 41 |   const int row_size =
 42 |         min(n_boxes - row_start * threadsPerBlock, threadsPerBlock);
 43 |   const int col_size =
 44 |         min(n_boxes - col_start * threadsPerBlock, threadsPerBlock);
 45 | 
 46 |   __shared__ float block_boxes[threadsPerBlock * 5];
 47 |   if (threadIdx.x < col_size) {
 48 |     block_boxes[threadIdx.x * 5 + 0] =
 49 |         dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 0];
 50 |     block_boxes[threadIdx.x * 5 + 1] =
 51 |         dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 1];
 52 |     block_boxes[threadIdx.x * 5 + 2] =
 53 |         dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 2];
 54 |     block_boxes[threadIdx.x * 5 + 3] =
 55 |         dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 3];
 56 |     block_boxes[threadIdx.x * 5 + 4] =
 57 |         dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 4];
 58 |   }
 59 |   __syncthreads();
 60 | 
 61 |   if (threadIdx.x < row_size) {
 62 |     const int cur_box_idx = threadsPerBlock * row_start + threadIdx.x;
 63 |     const float *cur_box = dev_boxes + cur_box_idx * 5;
 64 |     int i = 0;
 65 |     unsigned long long t = 0;
 66 |     int start = 0;
 67 |     if (row_start == col_start) {
 68 |       start = threadIdx.x + 1;
 69 |     }
 70 |     for (i = start; i < col_size; i++) {
 71 |       if (devIoU(cur_box, block_boxes + i * 5) > nms_overlap_thresh) {
 72 |         t |= 1ULL << i;
 73 |       }
 74 |     }
 75 |     const int col_blocks = DIVUP(n_boxes, threadsPerBlock);
 76 |     dev_mask[cur_box_idx * col_blocks + col_start] = t;
 77 |   }
 78 | }
 79 | 
 80 | void _set_device(int device_id) {
 81 |   int current_device;
 82 |   CUDA_CHECK(cudaGetDevice(&current_device));
 83 |   if (current_device == device_id) {
 84 |     return;
 85 |   }
 86 |   // The call to cudaSetDevice must come before any calls to Get, which
 87 |   // may perform initialization using the GPU.
 88 |   CUDA_CHECK(cudaSetDevice(device_id));
 89 | }
 90 | 
 91 | void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num,
 92 |           int boxes_dim, float nms_overlap_thresh, int device_id) {
 93 |   _set_device(device_id);
 94 | 
 95 |   float* boxes_dev = NULL;
 96 |   unsigned long long* mask_dev = NULL;
 97 | 
 98 |   const int col_blocks = DIVUP(boxes_num, threadsPerBlock);
 99 | 
100 |   CUDA_CHECK(cudaMalloc(&boxes_dev,
101 |                         boxes_num * boxes_dim * sizeof(float)));
102 |   CUDA_CHECK(cudaMemcpy(boxes_dev,
103 |                         boxes_host,
104 |                         boxes_num * boxes_dim * sizeof(float),
105 |                         cudaMemcpyHostToDevice));
106 | 
107 |   CUDA_CHECK(cudaMalloc(&mask_dev,
108 |                         boxes_num * col_blocks * sizeof(unsigned long long)));
109 | 
110 |   dim3 blocks(DIVUP(boxes_num, threadsPerBlock),
111 |               DIVUP(boxes_num, threadsPerBlock));
112 |   dim3 threads(threadsPerBlock);
113 |   nms_kernel<<<blocks, threads>>>(boxes_num,
114 |                                   nms_overlap_thresh,
115 |                                   boxes_dev,
116 |                                   mask_dev);
117 | 
118 |   std::vector<unsigned long long> mask_host(boxes_num * col_blocks);
119 |   CUDA_CHECK(cudaMemcpy(&mask_host[0],
120 |                         mask_dev,
121 |                         sizeof(unsigned long long) * boxes_num * col_blocks,
122 |                         cudaMemcpyDeviceToHost));
123 | 
124 |   std::vector<unsigned long long> remv(col_blocks);
125 |   memset(&remv[0], 0, sizeof(unsigned long long) * col_blocks);
126 | 
127 |   int num_to_keep = 0;
128 |   for (int i = 0; i < boxes_num; i++) {
129 |     int nblock = i / threadsPerBlock;
130 |     int inblock = i % threadsPerBlock;
131 | 
132 |     if (!(remv[nblock] & (1ULL << inblock))) {
133 |       keep_out[num_to_keep++] = i;
134 |       unsigned long long *p = &mask_host[0] + i * col_blocks;
135 |       for (int j = nblock; j < col_blocks; j++) {
136 |         remv[j] |= p[j];
137 |       }
138 |     }
139 |   }
140 |   *num_out = num_to_keep;
141 | 
142 |   CUDA_CHECK(cudaFree(boxes_dev));
143 |   CUDA_CHECK(cudaFree(mask_dev));
144 | }
145 | 


--------------------------------------------------------------------------------
/utils/nms/py_cpu_nms.py:
--------------------------------------------------------------------------------
 1 | # --------------------------------------------------------
 2 | # Fast R-CNN
 3 | # Copyright (c) 2015 Microsoft
 4 | # Licensed under The MIT License [see LICENSE for details]
 5 | # Written by Ross Girshick
 6 | # --------------------------------------------------------
 7 | 
 8 | import numpy as np
 9 | 
10 | def py_cpu_nms(dets, thresh):
11 |     """Pure Python NMS baseline."""
12 |     x1 = dets[:, 0]
13 |     y1 = dets[:, 1]
14 |     x2 = dets[:, 2]
15 |     y2 = dets[:, 3]
16 |     scores = dets[:, 4]
17 | 
18 |     areas = (x2 - x1 + 1) * (y2 - y1 + 1)
19 |     order = scores.argsort()[::-1]
20 | 
21 |     keep = []
22 |     while order.size > 0:
23 |         i = order[0]
24 |         keep.append(i)
25 |         xx1 = np.maximum(x1[i], x1[order[1:]])
26 |         yy1 = np.maximum(y1[i], y1[order[1:]])
27 |         xx2 = np.minimum(x2[i], x2[order[1:]])
28 |         yy2 = np.minimum(y2[i], y2[order[1:]])
29 | 
30 |         w = np.maximum(0.0, xx2 - xx1 + 1)
31 |         h = np.maximum(0.0, yy2 - yy1 + 1)
32 |         inter = w * h
33 |         ovr = inter / (areas[i] + areas[order[1:]] - inter)
34 | 
35 |         inds = np.where(ovr <= thresh)[0]
36 |         order = order[inds + 1]
37 | 
38 |     return keep
39 | 


--------------------------------------------------------------------------------
/utils/nms_wrapper.py:
--------------------------------------------------------------------------------
 1 | # --------------------------------------------------------
 2 | # Fast R-CNN
 3 | # Copyright (c) 2015 Microsoft
 4 | # Licensed under The MIT License [see LICENSE for details]
 5 | # Written by Ross Girshick
 6 | # --------------------------------------------------------
 7 | 
 8 | from .nms.cpu_nms import cpu_nms, cpu_soft_nms
 9 | from .nms.gpu_nms import gpu_nms
10 | 
11 | def nms(dets, thresh, force_cpu=False):
12 |     """Dispatch to either CPU or GPU NMS implementations."""
13 | 
14 |     if dets.shape[0] == 0:
15 |         return []
16 |     if force_cpu:
17 |         #return cpu_soft_nms(dets, thresh, method = 0)
18 |         return cpu_nms(dets, thresh)
19 |     return gpu_nms(dets, thresh)
20 | 
21 | def soft_nms(dets, Nt=0.3, sigma=0.5, thresh=0.001, method=1):
22 |     """Dispatch to either CPU or GPU NMS implementations."""
23 | 
24 |     if dets.shape[0] == 0:
25 |         return []
26 |     return cpu_soft_nms(dets, sigma, Nt, thresh, method)


--------------------------------------------------------------------------------
/utils/preprocess.py:
--------------------------------------------------------------------------------
 1 | from __future__ import division
 2 | 
 3 | import torch 
 4 | import torch.nn as nn
 5 | import torch.nn.functional as F 
 6 | from torch.autograd import Variable
 7 | import numpy as np
 8 | import cv2 
 9 | 
10 | 
11 | def letterbox_image(img, resize_wh):
12 |     '''resize image with unchanged aspect ratio using padding'''
13 |     img_w, img_h = img.shape[1], img.shape[0]
14 |     w, h = resize_wh
15 |     new_w = int(img_w * min(w/img_w, h/img_h))
16 |     new_h = int(img_h * min(w/img_w, h/img_h))
17 |     resized_image = cv2.resize(img, (new_w, new_h), interpolation = cv2.INTER_CUBIC)
18 |     
19 |     canvas = np.full((resize_wh[1], resize_wh[0], 3), 128)
20 | 
21 |     canvas[(h-new_h)//2:(h-new_h)//2 + new_h,(w-new_w)//2:(w-new_w)//2 + new_w,  :] = resized_image
22 |     
23 |     return canvas
24 | 
25 | def preproc_for_test(img, resize_wh, use_pad=False):
26 |     if not use_pad:
27 |         img = cv2.resize(img, resize_wh) 
28 |     else:
29 |         img = letterbox_image(img, resize_wh)        
30 |     img_ = img[:,:,::-1].transpose((2,0,1)).copy()
31 |     img_ = torch.from_numpy(img_).float().div(255.0).unsqueeze(0)
32 |     return img_
33 | 
34 | 


--------------------------------------------------------------------------------
/utils/pycocotools/__init__.py:
--------------------------------------------------------------------------------
1 | __author__ = 'tylin'
2 | 


--------------------------------------------------------------------------------
/utils/pycocotools/_mask.pyx:
--------------------------------------------------------------------------------
  1 | # distutils: language = c
  2 | # distutils: sources = ../common/maskApi.c
  3 | 
  4 | #**************************************************************************
  5 | # Microsoft COCO Toolbox.      version 2.0
  6 | # Data, paper, and tutorials available at:  http://mscoco.org/
  7 | # Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
  8 | # Licensed under the Simplified BSD License [see coco/license.txt]
  9 | #**************************************************************************
 10 | 
 11 | __author__ = 'tsungyi'
 12 | 
 13 | import sys
 14 | PYTHON_VERSION = sys.version_info[0]
 15 | 
 16 | # import both Python-level and C-level symbols of Numpy
 17 | # the API uses Numpy to interface C and Python
 18 | import numpy as np
 19 | cimport numpy as np
 20 | from libc.stdlib cimport malloc, free
 21 | 
 22 | # intialized Numpy. must do.
 23 | np.import_array()
 24 | 
 25 | # import numpy C function
 26 | # we use PyArray_ENABLEFLAGS to make Numpy ndarray responsible to memoery management
 27 | cdef extern from "numpy/arrayobject.h":
 28 |     void PyArray_ENABLEFLAGS(np.ndarray arr, int flags)
 29 | 
 30 | # Declare the prototype of the C functions in MaskApi.h
 31 | cdef extern from "maskApi.h":
 32 |     ctypedef unsigned int uint
 33 |     ctypedef unsigned long siz
 34 |     ctypedef unsigned char byte
 35 |     ctypedef double* BB
 36 |     ctypedef struct RLE:
 37 |         siz h,
 38 |         siz w,
 39 |         siz m,
 40 |         uint* cnts,
 41 |     void rlesInit( RLE **R, siz n )
 42 |     void rleEncode( RLE *R, const byte *M, siz h, siz w, siz n )
 43 |     void rleDecode( const RLE *R, byte *mask, siz n )
 44 |     void rleMerge( const RLE *R, RLE *M, siz n, int intersect )
 45 |     void rleArea( const RLE *R, siz n, uint *a )
 46 |     void rleIou( RLE *dt, RLE *gt, siz m, siz n, byte *iscrowd, double *o )
 47 |     void bbIou( BB dt, BB gt, siz m, siz n, byte *iscrowd, double *o )
 48 |     void rleToBbox( const RLE *R, BB bb, siz n )
 49 |     void rleFrBbox( RLE *R, const BB bb, siz h, siz w, siz n )
 50 |     void rleFrPoly( RLE *R, const double *xy, siz k, siz h, siz w )
 51 |     char* rleToString( const RLE *R )
 52 |     void rleFrString( RLE *R, char *s, siz h, siz w )
 53 | 
 54 | # python class to wrap RLE array in C
 55 | # the class handles the memory allocation and deallocation
 56 | cdef class RLEs:
 57 |     cdef RLE *_R
 58 |     cdef siz _n
 59 | 
 60 |     def __cinit__(self, siz n =0):
 61 |         rlesInit(&self._R, n)
 62 |         self._n = n
 63 | 
 64 |     # free the RLE array here
 65 |     def __dealloc__(self):
 66 |         if self._R is not NULL:
 67 |             for i in range(self._n):
 68 |                 free(self._R[i].cnts)
 69 |             free(self._R)
 70 |     def __getattr__(self, key):
 71 |         if key == 'n':
 72 |             return self._n
 73 |         raise AttributeError(key)
 74 | 
 75 | # python class to wrap Mask array in C
 76 | # the class handles the memory allocation and deallocation
 77 | cdef class Masks:
 78 |     cdef byte *_mask
 79 |     cdef siz _h
 80 |     cdef siz _w
 81 |     cdef siz _n
 82 | 
 83 |     def __cinit__(self, h, w, n):
 84 |         self._mask = <byte*> malloc(h*w*n* sizeof(byte))
 85 |         self._h = h
 86 |         self._w = w
 87 |         self._n = n
 88 |     # def __dealloc__(self):
 89 |         # the memory management of _mask has been passed to np.ndarray
 90 |         # it doesn't need to be freed here
 91 | 
 92 |     # called when passing into np.array() and return an np.ndarray in column-major order
 93 |     def __array__(self):
 94 |         cdef np.npy_intp shape[1]
 95 |         shape[0] = <np.npy_intp> self._h*self._w*self._n
 96 |         # Create a 1D array, and reshape it to fortran/Matlab column-major array
 97 |         ndarray = np.PyArray_SimpleNewFromData(1, shape, np.NPY_UINT8, self._mask).reshape((self._h, self._w, self._n), order='F')
 98 |         # The _mask allocated by Masks is now handled by ndarray
 99 |         PyArray_ENABLEFLAGS(ndarray, np.NPY_OWNDATA)
100 |         return ndarray
101 | 
102 | # internal conversion from Python RLEs object to compressed RLE format
103 | def _toString(RLEs Rs):
104 |     cdef siz n = Rs.n
105 |     cdef bytes py_string
106 |     cdef char* c_string
107 |     objs = []
108 |     for i in range(n):
109 |         c_string = rleToString( <RLE*> &Rs._R[i] )
110 |         py_string = c_string
111 |         objs.append({
112 |             'size': [Rs._R[i].h, Rs._R[i].w],
113 |             'counts': py_string
114 |         })
115 |         free(c_string)
116 |     return objs
117 | 
118 | # internal conversion from compressed RLE format to Python RLEs object
119 | def _frString(rleObjs):
120 |     cdef siz n = len(rleObjs)
121 |     Rs = RLEs(n)
122 |     cdef bytes py_string
123 |     cdef char* c_string
124 |     for i, obj in enumerate(rleObjs):
125 |         if PYTHON_VERSION == 2:
126 |             py_string = str(obj['counts']).encode('utf8')
127 |         elif PYTHON_VERSION == 3:
128 |             py_string = str.encode(obj['counts']) if type(obj['counts']) == str else obj['counts']
129 |         else:
130 |             raise Exception('Python version must be 2 or 3')
131 |         c_string = py_string
132 |         rleFrString( <RLE*> &Rs._R[i], <char*> c_string, obj['size'][0], obj['size'][1] )
133 |     return Rs
134 | 
135 | # encode mask to RLEs objects
136 | # list of RLE string can be generated by RLEs member function
137 | def encode(np.ndarray[np.uint8_t, ndim=3, mode='fortran'] mask):
138 |     h, w, n = mask.shape[0], mask.shape[1], mask.shape[2]
139 |     cdef RLEs Rs = RLEs(n)
140 |     rleEncode(Rs._R,<byte*>mask.data,h,w,n)
141 |     objs = _toString(Rs)
142 |     return objs
143 | 
144 | # decode mask from compressed list of RLE string or RLEs object
145 | def decode(rleObjs):
146 |     cdef RLEs Rs = _frString(rleObjs)
147 |     h, w, n = Rs._R[0].h, Rs._R[0].w, Rs._n
148 |     masks = Masks(h, w, n)
149 |     rleDecode(<RLE*>Rs._R, masks._mask, n);
150 |     return np.array(masks)
151 | 
152 | def merge(rleObjs, intersect=0):
153 |     cdef RLEs Rs = _frString(rleObjs)
154 |     cdef RLEs R = RLEs(1)
155 |     rleMerge(<RLE*>Rs._R, <RLE*> R._R, <siz> Rs._n, intersect)
156 |     obj = _toString(R)[0]
157 |     return obj
158 | 
159 | def area(rleObjs):
160 |     cdef RLEs Rs = _frString(rleObjs)
161 |     cdef uint* _a = <uint*> malloc(Rs._n* sizeof(uint))
162 |     rleArea(Rs._R, Rs._n, _a)
163 |     cdef np.npy_intp shape[1]
164 |     shape[0] = <np.npy_intp> Rs._n
165 |     a = np.array((Rs._n, ), dtype=np.uint8)
166 |     a = np.PyArray_SimpleNewFromData(1, shape, np.NPY_UINT32, _a)
167 |     PyArray_ENABLEFLAGS(a, np.NPY_OWNDATA)
168 |     return a
169 | 
170 | # iou computation. support function overload (RLEs-RLEs and bbox-bbox).
171 | def iou( dt, gt, pyiscrowd ):
172 |     def _preproc(objs):
173 |         if len(objs) == 0:
174 |             return objs
175 |         if type(objs) == np.ndarray:
176 |             if len(objs.shape) == 1:
177 |                 objs = objs.reshape((objs[0], 1))
178 |             # check if it's Nx4 bbox
179 |             if not len(objs.shape) == 2 or not objs.shape[1] == 4:
180 |                 raise Exception('numpy ndarray input is only for *bounding boxes* and should have Nx4 dimension')
181 |             objs = objs.astype(np.double)
182 |         elif type(objs) == list:
183 |             # check if list is in box format and convert it to np.ndarray
184 |             isbox = np.all(np.array([(len(obj)==4) and ((type(obj)==list) or (type(obj)==np.ndarray)) for obj in objs]))
185 |             isrle = np.all(np.array([type(obj) == dict for obj in objs]))
186 |             if isbox:
187 |                 objs = np.array(objs, dtype=np.double)
188 |                 if len(objs.shape) == 1:
189 |                     objs = objs.reshape((1,objs.shape[0]))
190 |             elif isrle:
191 |                 objs = _frString(objs)
192 |             else:
193 |                 raise Exception('list input can be bounding box (Nx4) or RLEs ([RLE])')
194 |         else:
195 |             raise Exception('unrecognized type.  The following type: RLEs (rle), np.ndarray (box), and list (box) are supported.')
196 |         return objs
197 |     def _rleIou(RLEs dt, RLEs gt, np.ndarray[np.uint8_t, ndim=1] iscrowd, siz m, siz n, np.ndarray[np.double_t,  ndim=1] _iou):
198 |         rleIou( <RLE*> dt._R, <RLE*> gt._R, m, n, <byte*> iscrowd.data, <double*> _iou.data )
199 |     def _bbIou(np.ndarray[np.double_t, ndim=2] dt, np.ndarray[np.double_t, ndim=2] gt, np.ndarray[np.uint8_t, ndim=1] iscrowd, siz m, siz n, np.ndarray[np.double_t, ndim=1] _iou):
200 |         bbIou( <BB> dt.data, <BB> gt.data, m, n, <byte*> iscrowd.data, <double*>_iou.data )
201 |     def _len(obj):
202 |         cdef siz N = 0
203 |         if type(obj) == RLEs:
204 |             N = obj.n
205 |         elif len(obj)==0:
206 |             pass
207 |         elif type(obj) == np.ndarray:
208 |             N = obj.shape[0]
209 |         return N
210 |     # convert iscrowd to numpy array
211 |     cdef np.ndarray[np.uint8_t, ndim=1] iscrowd = np.array(pyiscrowd, dtype=np.uint8)
212 |     # simple type checking
213 |     cdef siz m, n
214 |     dt = _preproc(dt)
215 |     gt = _preproc(gt)
216 |     m = _len(dt)
217 |     n = _len(gt)
218 |     if m == 0 or n == 0:
219 |         return []
220 |     if not type(dt) == type(gt):
221 |         raise Exception('The dt and gt should have the same data type, either RLEs, list or np.ndarray')
222 | 
223 |     # define local variables
224 |     cdef double* _iou = <double*> 0
225 |     cdef np.npy_intp shape[1]
226 |     # check type and assign iou function
227 |     if type(dt) == RLEs:
228 |         _iouFun = _rleIou
229 |     elif type(dt) == np.ndarray:
230 |         _iouFun = _bbIou
231 |     else:
232 |         raise Exception('input data type not allowed.')
233 |     _iou = <double*> malloc(m*n* sizeof(double))
234 |     iou = np.zeros((m*n, ), dtype=np.double)
235 |     shape[0] = <np.npy_intp> m*n
236 |     iou = np.PyArray_SimpleNewFromData(1, shape, np.NPY_DOUBLE, _iou)
237 |     PyArray_ENABLEFLAGS(iou, np.NPY_OWNDATA)
238 |     _iouFun(dt, gt, iscrowd, m, n, iou)
239 |     return iou.reshape((m,n), order='F')
240 | 
241 | def toBbox( rleObjs ):
242 |     cdef RLEs Rs = _frString(rleObjs)
243 |     cdef siz n = Rs.n
244 |     cdef BB _bb = <BB> malloc(4*n* sizeof(double))
245 |     rleToBbox( <const RLE*> Rs._R, _bb, n )
246 |     cdef np.npy_intp shape[1]
247 |     shape[0] = <np.npy_intp> 4*n
248 |     bb = np.array((1,4*n), dtype=np.double)
249 |     bb = np.PyArray_SimpleNewFromData(1, shape, np.NPY_DOUBLE, _bb).reshape((n, 4))
250 |     PyArray_ENABLEFLAGS(bb, np.NPY_OWNDATA)
251 |     return bb
252 | 
253 | def frBbox(np.ndarray[np.double_t, ndim=2] bb, siz h, siz w ):
254 |     cdef siz n = bb.shape[0]
255 |     Rs = RLEs(n)
256 |     rleFrBbox( <RLE*> Rs._R, <const BB> bb.data, h, w, n )
257 |     objs = _toString(Rs)
258 |     return objs
259 | 
260 | def frPoly( poly, siz h, siz w ):
261 |     cdef np.ndarray[np.double_t, ndim=1] np_poly
262 |     n = len(poly)
263 |     Rs = RLEs(n)
264 |     for i, p in enumerate(poly):
265 |         np_poly = np.array(p, dtype=np.double, order='F')
266 |         rleFrPoly( <RLE*>&Rs._R[i], <const double*> np_poly.data, int(len(p)/2), h, w )
267 |     objs = _toString(Rs)
268 |     return objs
269 | 
270 | def frUncompressedRLE(ucRles, siz h, siz w):
271 |     cdef np.ndarray[np.uint32_t, ndim=1] cnts
272 |     cdef RLE R
273 |     cdef uint *data
274 |     n = len(ucRles)
275 |     objs = []
276 |     for i in range(n):
277 |         Rs = RLEs(1)
278 |         cnts = np.array(ucRles[i]['counts'], dtype=np.uint32)
279 |         # time for malloc can be saved here but it's fine
280 |         data = <uint*> malloc(len(cnts)* sizeof(uint))
281 |         for j in range(len(cnts)):
282 |             data[j] = <uint> cnts[j]
283 |         R = RLE(ucRles[i]['size'][0], ucRles[i]['size'][1], len(cnts), <uint*> data)
284 |         Rs._R[0] = R
285 |         objs.append(_toString(Rs)[0])
286 |     return objs
287 | 
288 | def frPyObjects(pyobj, h, w):
289 |     # encode rle from a list of python objects
290 |     if type(pyobj) == np.ndarray:
291 |         objs = frBbox(pyobj, h, w)
292 |     elif type(pyobj) == list and len(pyobj[0]) == 4:
293 |         objs = frBbox(pyobj, h, w)
294 |     elif type(pyobj) == list and len(pyobj[0]) > 4:
295 |         objs = frPoly(pyobj, h, w)
296 |     elif type(pyobj) == list and type(pyobj[0]) == dict \
297 |         and 'counts' in pyobj[0] and 'size' in pyobj[0]:
298 |         objs = frUncompressedRLE(pyobj, h, w)
299 |     # encode rle from single python object
300 |     elif type(pyobj) == list and len(pyobj) == 4:
301 |         objs = frBbox([pyobj], h, w)[0]
302 |     elif type(pyobj) == list and len(pyobj) > 4:
303 |         objs = frPoly([pyobj], h, w)[0]
304 |     elif type(pyobj) == dict and 'counts' in pyobj and 'size' in pyobj:
305 |         objs = frUncompressedRLE([pyobj], h, w)[0]
306 |     else:
307 |         raise Exception('input type is not supported.')
308 |     return objs
309 | 


--------------------------------------------------------------------------------
/utils/pycocotools/coco.py:
--------------------------------------------------------------------------------
  1 | __author__ = 'tylin'
  2 | __version__ = '2.0'
  3 | # Interface for accessing the Microsoft COCO dataset.
  4 | 
  5 | # Microsoft COCO is a large image dataset designed for object detection,
  6 | # segmentation, and caption generation. pycocotools is a Python API that
  7 | # assists in loading, parsing and visualizing the annotations in COCO.
  8 | # Please visit http://mscoco.org/ for more information on COCO, including
  9 | # for the data, paper, and tutorials. The exact format of the annotations
 10 | # is also described on the COCO website. For example usage of the pycocotools
 11 | # please see pycocotools_demo.ipynb. In addition to this API, please download both
 12 | # the COCO images and annotations in order to run the demo.
 13 | 
 14 | # An alternative to using the API is to load the annotations directly
 15 | # into Python dictionary
 16 | # Using the API provides additional utility functions. Note that this API
 17 | # supports both *instance* and *caption* annotations. In the case of
 18 | # captions not all functions are defined (e.g. categories are undefined).
 19 | 
 20 | # The following API functions are defined:
 21 | #  COCO       - COCO api class that loads COCO annotation file and prepare data structures.
 22 | #  decodeMask - Decode binary mask M encoded via run-length encoding.
 23 | #  encodeMask - Encode binary mask M using run-length encoding.
 24 | #  getAnnIds  - Get ann ids that satisfy given filter conditions.
 25 | #  getCatIds  - Get cat ids that satisfy given filter conditions.
 26 | #  getImgIds  - Get img ids that satisfy given filter conditions.
 27 | #  loadAnns   - Load anns with the specified ids.
 28 | #  loadCats   - Load cats with the specified ids.
 29 | #  loadImgs   - Load imgs with the specified ids.
 30 | #  annToMask  - Convert segmentation in an annotation to binary mask.
 31 | #  showAnns   - Display the specified annotations.
 32 | #  loadRes    - Load algorithm results and create API for accessing them.
 33 | #  download   - Download COCO images from mscoco.org server.
 34 | # Throughout the API "ann"=annotation, "cat"=category, and "img"=image.
 35 | # Help on each functions can be accessed by: "help COCO>function".
 36 | 
 37 | # See also COCO>decodeMask,
 38 | # COCO>encodeMask, COCO>getAnnIds, COCO>getCatIds,
 39 | # COCO>getImgIds, COCO>loadAnns, COCO>loadCats,
 40 | # COCO>loadImgs, COCO>annToMask, COCO>showAnns
 41 | 
 42 | # Microsoft COCO Toolbox.      version 2.0
 43 | # Data, paper, and tutorials available at:  http://mscoco.org/
 44 | # Code written by Piotr Dollar and Tsung-Yi Lin, 2014.
 45 | # Licensed under the Simplified BSD License [see bsd.txt]
 46 | 
 47 | import json
 48 | import time
 49 | import matplotlib.pyplot as plt
 50 | from matplotlib.collections import PatchCollection
 51 | from matplotlib.patches import Polygon
 52 | import numpy as np
 53 | import copy
 54 | import itertools
 55 | from . import mask as maskUtils
 56 | import os
 57 | from collections import defaultdict
 58 | import sys
 59 | PYTHON_VERSION = sys.version_info[0]
 60 | if PYTHON_VERSION == 2:
 61 |     from urllib import urlretrieve
 62 | elif PYTHON_VERSION == 3:
 63 |     from urllib.request import urlretrieve
 64 | 
 65 | class COCO:
 66 |     def __init__(self, annotation_file=None):
 67 |         """
 68 |         Constructor of Microsoft COCO helper class for reading and visualizing annotations.
 69 |         :param annotation_file (str): location of annotation file
 70 |         :param image_folder (str): location to the folder that hosts images.
 71 |         :return:
 72 |         """
 73 |         # load dataset
 74 |         self.dataset,self.anns,self.cats,self.imgs = dict(),dict(),dict(),dict()
 75 |         self.imgToAnns, self.catToImgs = defaultdict(list), defaultdict(list)
 76 |         if not annotation_file == None:
 77 |             print('loading annotations into memory...')
 78 |             tic = time.time()
 79 |             dataset = json.load(open(annotation_file, 'r'))
 80 |             assert type(dataset)==dict, 'annotation file format {} not supported'.format(type(dataset))
 81 |             print('Done (t={:0.2f}s)'.format(time.time()- tic))
 82 |             self.dataset = dataset
 83 |             self.createIndex()
 84 | 
 85 |     def createIndex(self):
 86 |         # create index
 87 |         print('creating index...')
 88 |         anns, cats, imgs = {}, {}, {}
 89 |         imgToAnns,catToImgs = defaultdict(list),defaultdict(list)
 90 |         if 'annotations' in self.dataset:
 91 |             for ann in self.dataset['annotations']:
 92 |                 imgToAnns[ann['image_id']].append(ann)
 93 |                 anns[ann['id']] = ann
 94 | 
 95 |         if 'images' in self.dataset:
 96 |             for img in self.dataset['images']:
 97 |                 imgs[img['id']] = img
 98 | 
 99 |         if 'categories' in self.dataset:
100 |             for cat in self.dataset['categories']:
101 |                 cats[cat['id']] = cat
102 | 
103 |         if 'annotations' in self.dataset and 'categories' in self.dataset:
104 |             for ann in self.dataset['annotations']:
105 |                 catToImgs[ann['category_id']].append(ann['image_id'])
106 | 
107 |         print('index created!')
108 | 
109 |         # create class members
110 |         self.anns = anns
111 |         self.imgToAnns = imgToAnns
112 |         self.catToImgs = catToImgs
113 |         self.imgs = imgs
114 |         self.cats = cats
115 | 
116 |     def info(self):
117 |         """
118 |         Print information about the annotation file.
119 |         :return:
120 |         """
121 |         for key, value in self.dataset['info'].items():
122 |             print('{}: {}'.format(key, value))
123 | 
124 |     def getAnnIds(self, imgIds=[], catIds=[], areaRng=[], iscrowd=None):
125 |         """
126 |         Get ann ids that satisfy given filter conditions. default skips that filter
127 |         :param imgIds  (int array)     : get anns for given imgs
128 |                catIds  (int array)     : get anns for given cats
129 |                areaRng (float array)   : get anns for given area range (e.g. [0 inf])
130 |                iscrowd (boolean)       : get anns for given crowd label (False or True)
131 |         :return: ids (int array)       : integer array of ann ids
132 |         """
133 |         imgIds = imgIds if type(imgIds) == list else [imgIds]
134 |         catIds = catIds if type(catIds) == list else [catIds]
135 | 
136 |         if len(imgIds) == len(catIds) == len(areaRng) == 0:
137 |             anns = self.dataset['annotations']
138 |         else:
139 |             if not len(imgIds) == 0:
140 |                 lists = [self.imgToAnns[imgId] for imgId in imgIds if imgId in self.imgToAnns]
141 |                 anns = list(itertools.chain.from_iterable(lists))
142 |             else:
143 |                 anns = self.dataset['annotations']
144 |             anns = anns if len(catIds)  == 0 else [ann for ann in anns if ann['category_id'] in catIds]
145 |             anns = anns if len(areaRng) == 0 else [ann for ann in anns if ann['area'] > areaRng[0] and ann['area'] < areaRng[1]]
146 |         if not iscrowd == None:
147 |             ids = [ann['id'] for ann in anns if ann['iscrowd'] == iscrowd]
148 |         else:
149 |             ids = [ann['id'] for ann in anns]
150 |         return ids
151 | 
152 |     def getCatIds(self, catNms=[], supNms=[], catIds=[]):
153 |         """
154 |         filtering parameters. default skips that filter.
155 |         :param catNms (str array)  : get cats for given cat names
156 |         :param supNms (str array)  : get cats for given supercategory names
157 |         :param catIds (int array)  : get cats for given cat ids
158 |         :return: ids (int array)   : integer array of cat ids
159 |         """
160 |         catNms = catNms if type(catNms) == list else [catNms]
161 |         supNms = supNms if type(supNms) == list else [supNms]
162 |         catIds = catIds if type(catIds) == list else [catIds]
163 | 
164 |         if len(catNms) == len(supNms) == len(catIds) == 0:
165 |             cats = self.dataset['categories']
166 |         else:
167 |             cats = self.dataset['categories']
168 |             cats = cats if len(catNms) == 0 else [cat for cat in cats if cat['name']          in catNms]
169 |             cats = cats if len(supNms) == 0 else [cat for cat in cats if cat['supercategory'] in supNms]
170 |             cats = cats if len(catIds) == 0 else [cat for cat in cats if cat['id']            in catIds]
171 |         ids = [cat['id'] for cat in cats]
172 |         return ids
173 | 
174 |     def getImgIds(self, imgIds=[], catIds=[]):
175 |         '''
176 |         Get img ids that satisfy given filter conditions.
177 |         :param imgIds (int array) : get imgs for given ids
178 |         :param catIds (int array) : get imgs with all given cats
179 |         :return: ids (int array)  : integer array of img ids
180 |         '''
181 |         imgIds = imgIds if type(imgIds) == list else [imgIds]
182 |         catIds = catIds if type(catIds) == list else [catIds]
183 | 
184 |         if len(imgIds) == len(catIds) == 0:
185 |             ids = self.imgs.keys()
186 |         else:
187 |             ids = set(imgIds)
188 |             for i, catId in enumerate(catIds):
189 |                 if i == 0 and len(ids) == 0:
190 |                     ids = set(self.catToImgs[catId])
191 |                 else:
192 |                     ids &= set(self.catToImgs[catId])
193 |         return list(ids)
194 | 
195 |     def loadAnns(self, ids=[]):
196 |         """
197 |         Load anns with the specified ids.
198 |         :param ids (int array)       : integer ids specifying anns
199 |         :return: anns (object array) : loaded ann objects
200 |         """
201 |         if type(ids) == list:
202 |             return [self.anns[id] for id in ids]
203 |         elif type(ids) == int:
204 |             return [self.anns[ids]]
205 | 
206 |     def loadCats(self, ids=[]):
207 |         """
208 |         Load cats with the specified ids.
209 |         :param ids (int array)       : integer ids specifying cats
210 |         :return: cats (object array) : loaded cat objects
211 |         """
212 |         if type(ids) == list:
213 |             return [self.cats[id] for id in ids]
214 |         elif type(ids) == int:
215 |             return [self.cats[ids]]
216 | 
217 |     def loadImgs(self, ids=[]):
218 |         """
219 |         Load anns with the specified ids.
220 |         :param ids (int array)       : integer ids specifying img
221 |         :return: imgs (object array) : loaded img objects
222 |         """
223 |         if type(ids) == list:
224 |             return [self.imgs[id] for id in ids]
225 |         elif type(ids) == int:
226 |             return [self.imgs[ids]]
227 | 
228 |     def showAnns(self, anns):
229 |         """
230 |         Display the specified annotations.
231 |         :param anns (array of object): annotations to display
232 |         :return: None
233 |         """
234 |         if len(anns) == 0:
235 |             return 0
236 |         if 'segmentation' in anns[0] or 'keypoints' in anns[0]:
237 |             datasetType = 'instances'
238 |         elif 'caption' in anns[0]:
239 |             datasetType = 'captions'
240 |         else:
241 |             raise Exception('datasetType not supported')
242 |         if datasetType == 'instances':
243 |             ax = plt.gca()
244 |             ax.set_autoscale_on(False)
245 |             polygons = []
246 |             color = []
247 |             for ann in anns:
248 |                 c = (np.random.random((1, 3))*0.6+0.4).tolist()[0]
249 |                 if 'segmentation' in ann:
250 |                     if type(ann['segmentation']) == list:
251 |                         # polygon
252 |                         for seg in ann['segmentation']:
253 |                             poly = np.array(seg).reshape((int(len(seg)/2), 2))
254 |                             polygons.append(Polygon(poly))
255 |                             color.append(c)
256 |                     else:
257 |                         # mask
258 |                         t = self.imgs[ann['image_id']]
259 |                         if type(ann['segmentation']['counts']) == list:
260 |                             rle = maskUtils.frPyObjects([ann['segmentation']], t['height'], t['width'])
261 |                         else:
262 |                             rle = [ann['segmentation']]
263 |                         m = maskUtils.decode(rle)
264 |                         img = np.ones( (m.shape[0], m.shape[1], 3) )
265 |                         if ann['iscrowd'] == 1:
266 |                             color_mask = np.array([2.0,166.0,101.0])/255
267 |                         if ann['iscrowd'] == 0:
268 |                             color_mask = np.random.random((1, 3)).tolist()[0]
269 |                         for i in range(3):
270 |                             img[:,:,i] = color_mask[i]
271 |                         ax.imshow(np.dstack( (img, m*0.5) ))
272 |                 if 'keypoints' in ann and type(ann['keypoints']) == list:
273 |                     # turn skeleton into zero-based index
274 |                     sks = np.array(self.loadCats(ann['category_id'])[0]['skeleton'])-1
275 |                     kp = np.array(ann['keypoints'])
276 |                     x = kp[0::3]
277 |                     y = kp[1::3]
278 |                     v = kp[2::3]
279 |                     for sk in sks:
280 |                         if np.all(v[sk]>0):
281 |                             plt.plot(x[sk],y[sk], linewidth=3, color=c)
282 |                     plt.plot(x[v>0], y[v>0],'o',markersize=8, markerfacecolor=c, markeredgecolor='k',markeredgewidth=2)
283 |                     plt.plot(x[v>1], y[v>1],'o',markersize=8, markerfacecolor=c, markeredgecolor=c, markeredgewidth=2)
284 |             p = PatchCollection(polygons, facecolor=color, linewidths=0, alpha=0.4)
285 |             ax.add_collection(p)
286 |             p = PatchCollection(polygons, facecolor='none', edgecolors=color, linewidths=2)
287 |             ax.add_collection(p)
288 |         elif datasetType == 'captions':
289 |             for ann in anns:
290 |                 print(ann['caption'])
291 | 
292 |     def loadRes(self, resFile):
293 |         """
294 |         Load result file and return a result api object.
295 |         :param   resFile (str)     : file name of result file
296 |         :return: res (obj)         : result api object
297 |         """
298 |         res = COCO()
299 |         res.dataset['images'] = [img for img in self.dataset['images']]
300 | 
301 |         print('Loading and preparing results...')
302 |         tic = time.time()
303 |         if type(resFile) == str or type(resFile) == unicode:
304 |             anns = json.load(open(resFile))
305 |         elif type(resFile) == np.ndarray:
306 |             anns = self.loadNumpyAnnotations(resFile)
307 |         else:
308 |             anns = resFile
309 |         assert type(anns) == list, 'results in not an array of objects'
310 |         annsImgIds = [ann['image_id'] for ann in anns]
311 |         assert set(annsImgIds) == (set(annsImgIds) & set(self.getImgIds())), \
312 |                'Results do not correspond to current coco set'
313 |         if 'caption' in anns[0]:
314 |             imgIds = set([img['id'] for img in res.dataset['images']]) & set([ann['image_id'] for ann in anns])
315 |             res.dataset['images'] = [img for img in res.dataset['images'] if img['id'] in imgIds]
316 |             for id, ann in enumerate(anns):
317 |                 ann['id'] = id+1
318 |         elif 'bbox' in anns[0] and not anns[0]['bbox'] == []:
319 |             res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
320 |             for id, ann in enumerate(anns):
321 |                 bb = ann['bbox']
322 |                 x1, x2, y1, y2 = [bb[0], bb[0]+bb[2], bb[1], bb[1]+bb[3]]
323 |                 if not 'segmentation' in ann:
324 |                     ann['segmentation'] = [[x1, y1, x1, y2, x2, y2, x2, y1]]
325 |                 ann['area'] = bb[2]*bb[3]
326 |                 ann['id'] = id+1
327 |                 ann['iscrowd'] = 0
328 |         elif 'segmentation' in anns[0]:
329 |             res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
330 |             for id, ann in enumerate(anns):
331 |                 # now only support compressed RLE format as segmentation results
332 |                 ann['area'] = maskUtils.area(ann['segmentation'])
333 |                 if not 'bbox' in ann:
334 |                     ann['bbox'] = maskUtils.toBbox(ann['segmentation'])
335 |                 ann['id'] = id+1
336 |                 ann['iscrowd'] = 0
337 |         elif 'keypoints' in anns[0]:
338 |             res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
339 |             for id, ann in enumerate(anns):
340 |                 s = ann['keypoints']
341 |                 x = s[0::3]
342 |                 y = s[1::3]
343 |                 x0,x1,y0,y1 = np.min(x), np.max(x), np.min(y), np.max(y)
344 |                 ann['area'] = (x1-x0)*(y1-y0)
345 |                 ann['id'] = id + 1
346 |                 ann['bbox'] = [x0,y0,x1-x0,y1-y0]
347 |         print('DONE (t={:0.2f}s)'.format(time.time()- tic))
348 | 
349 |         res.dataset['annotations'] = anns
350 |         res.createIndex()
351 |         return res
352 | 
353 |     def download(self, tarDir = None, imgIds = [] ):
354 |         '''
355 |         Download COCO images from mscoco.org server.
356 |         :param tarDir (str): COCO results directory name
357 |                imgIds (list): images to be downloaded
358 |         :return:
359 |         '''
360 |         if tarDir is None:
361 |             print('Please specify target directory')
362 |             return -1
363 |         if len(imgIds) == 0:
364 |             imgs = self.imgs.values()
365 |         else:
366 |             imgs = self.loadImgs(imgIds)
367 |         N = len(imgs)
368 |         if not os.path.exists(tarDir):
369 |             os.makedirs(tarDir)
370 |         for i, img in enumerate(imgs):
371 |             tic = time.time()
372 |             fname = os.path.join(tarDir, img['file_name'])
373 |             if not os.path.exists(fname):
374 |                 urlretrieve(img['coco_url'], fname)
375 |             print('downloaded {}/{} images (t={:0.1f}s)'.format(i, N, time.time()- tic))
376 | 
377 |     def loadNumpyAnnotations(self, data):
378 |         """
379 |         Convert result data from a numpy array [Nx7] where each row contains {imageID,x1,y1,w,h,score,class}
380 |         :param  data (numpy.ndarray)
381 |         :return: annotations (python nested list)
382 |         """
383 |         print('Converting ndarray to lists...')
384 |         assert(type(data) == np.ndarray)
385 |         print(data.shape)
386 |         assert(data.shape[1] == 7)
387 |         N = data.shape[0]
388 |         ann = []
389 |         for i in range(N):
390 |             if i % 1000000 == 0:
391 |                 print('{}/{}'.format(i,N))
392 |             ann += [{
393 |                 'image_id'  : int(data[i, 0]),
394 |                 'bbox'  : [ data[i, 1], data[i, 2], data[i, 3], data[i, 4] ],
395 |                 'score' : data[i, 5],
396 |                 'category_id': int(data[i, 6]),
397 |                 }]
398 |         return ann
399 | 
400 |     def annToRLE(self, ann):
401 |         """
402 |         Convert annotation which can be polygons, uncompressed RLE to RLE.
403 |         :return: binary mask (numpy 2D array)
404 |         """
405 |         t = self.imgs[ann['image_id']]
406 |         h, w = t['height'], t['width']
407 |         segm = ann['segmentation']
408 |         if type(segm) == list:
409 |             # polygon -- a single object might consist of multiple parts
410 |             # we merge all parts into one mask rle code
411 |             rles = maskUtils.frPyObjects(segm, h, w)
412 |             rle = maskUtils.merge(rles)
413 |         elif type(segm['counts']) == list:
414 |             # uncompressed RLE
415 |             rle = maskUtils.frPyObjects(segm, h, w)
416 |         else:
417 |             # rle
418 |             rle = ann['segmentation']
419 |         return rle
420 | 
421 |     def annToMask(self, ann):
422 |         """
423 |         Convert annotation which can be polygons, uncompressed RLE, or RLE to binary mask.
424 |         :return: binary mask (numpy 2D array)
425 |         """
426 |         rle = self.annToRLE(ann)
427 |         m = maskUtils.decode(rle)
428 |         return m


--------------------------------------------------------------------------------
/utils/pycocotools/mask.py:
--------------------------------------------------------------------------------
  1 | __author__ = 'tsungyi'
  2 | 
  3 | #import pycocotools._mask as _mask
  4 | from . import _mask
  5 | 
  6 | # Interface for manipulating masks stored in RLE format.
  7 | #
  8 | # RLE is a simple yet efficient format for storing binary masks. RLE
  9 | # first divides a vector (or vectorized image) into a series of piecewise
 10 | # constant regions and then for each piece simply stores the length of
 11 | # that piece. For example, given M=[0 0 1 1 1 0 1] the RLE counts would
 12 | # be [2 3 1 1], or for M=[1 1 1 1 1 1 0] the counts would be [0 6 1]
 13 | # (note that the odd counts are always the numbers of zeros). Instead of
 14 | # storing the counts directly, additional compression is achieved with a
 15 | # variable bitrate representation based on a common scheme called LEB128.
 16 | #
 17 | # Compression is greatest given large piecewise constant regions.
 18 | # Specifically, the size of the RLE is proportional to the number of
 19 | # *boundaries* in M (or for an image the number of boundaries in the y
 20 | # direction). Assuming fairly simple shapes, the RLE representation is
 21 | # O(sqrt(n)) where n is number of pixels in the object. Hence space usage
 22 | # is substantially lower, especially for large simple objects (large n).
 23 | #
 24 | # Many common operations on masks can be computed directly using the RLE
 25 | # (without need for decoding). This includes computations such as area,
 26 | # union, intersection, etc. All of these operations are linear in the
 27 | # size of the RLE, in other words they are O(sqrt(n)) where n is the area
 28 | # of the object. Computing these operations on the original mask is O(n).
 29 | # Thus, using the RLE can result in substantial computational savings.
 30 | #
 31 | # The following API functions are defined:
 32 | #  encode         - Encode binary masks using RLE.
 33 | #  decode         - Decode binary masks encoded via RLE.
 34 | #  merge          - Compute union or intersection of encoded masks.
 35 | #  iou            - Compute intersection over union between masks.
 36 | #  area           - Compute area of encoded masks.
 37 | #  toBbox         - Get bounding boxes surrounding encoded masks.
 38 | #  frPyObjects    - Convert polygon, bbox, and uncompressed RLE to encoded RLE mask.
 39 | #
 40 | # Usage:
 41 | #  Rs     = encode( masks )
 42 | #  masks  = decode( Rs )
 43 | #  R      = merge( Rs, intersect=false )
 44 | #  o      = iou( dt, gt, iscrowd )
 45 | #  a      = area( Rs )
 46 | #  bbs    = toBbox( Rs )
 47 | #  Rs     = frPyObjects( [pyObjects], h, w )
 48 | #
 49 | # In the API the following formats are used:
 50 | #  Rs      - [dict] Run-length encoding of binary masks
 51 | #  R       - dict Run-length encoding of binary mask
 52 | #  masks   - [hxwxn] Binary mask(s) (must have type np.ndarray(dtype=uint8) in column-major order)
 53 | #  iscrowd - [nx1] list of np.ndarray. 1 indicates corresponding gt image has crowd region to ignore
 54 | #  bbs     - [nx4] Bounding box(es) stored as [x y w h]
 55 | #  poly    - Polygon stored as [[x1 y1 x2 y2...],[x1 y1 ...],...] (2D list)
 56 | #  dt,gt   - May be either bounding boxes or encoded masks
 57 | # Both poly and bbs are 0-indexed (bbox=[0 0 1 1] encloses first pixel).
 58 | #
 59 | # Finally, a note about the intersection over union (iou) computation.
 60 | # The standard iou of a ground truth (gt) and detected (dt) object is
 61 | #  iou(gt,dt) = area(intersect(gt,dt)) / area(union(gt,dt))
 62 | # For "crowd" regions, we use a modified criteria. If a gt object is
 63 | # marked as "iscrowd", we allow a dt to match any subregion of the gt.
 64 | # Choosing gt' in the crowd gt that best matches the dt can be done using
 65 | # gt'=intersect(dt,gt). Since by definition union(gt',dt)=dt, computing
 66 | #  iou(gt,dt,iscrowd) = iou(gt',dt) = area(intersect(gt,dt)) / area(dt)
 67 | # For crowd gt regions we use this modified criteria above for the iou.
 68 | #
 69 | # To compile run "python setup.py build_ext --inplace"
 70 | # Please do not contact us for help with compiling.
 71 | #
 72 | # Microsoft COCO Toolbox.      version 2.0
 73 | # Data, paper, and tutorials available at:  http://mscoco.org/
 74 | # Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
 75 | # Licensed under the Simplified BSD License [see coco/license.txt]
 76 | 
 77 | iou         = _mask.iou
 78 | merge       = _mask.merge
 79 | frPyObjects = _mask.frPyObjects
 80 | 
 81 | def encode(bimask):
 82 |     if len(bimask.shape) == 3:
 83 |         return _mask.encode(bimask)
 84 |     elif len(bimask.shape) == 2:
 85 |         h, w = bimask.shape
 86 |         return _mask.encode(bimask.reshape((h, w, 1), order='F'))[0]
 87 | 
 88 | def decode(rleObjs):
 89 |     if type(rleObjs) == list:
 90 |         return _mask.decode(rleObjs)
 91 |     else:
 92 |         return _mask.decode([rleObjs])[:,:,0]
 93 | 
 94 | def area(rleObjs):
 95 |     if type(rleObjs) == list:
 96 |         return _mask.area(rleObjs)
 97 |     else:
 98 |         return _mask.area([rleObjs])[0]
 99 | 
100 | def toBbox(rleObjs):
101 |     if type(rleObjs) == list:
102 |         return _mask.toBbox(rleObjs)
103 |     else:
104 |         return _mask.toBbox([rleObjs])[0]
105 | 


--------------------------------------------------------------------------------
/utils/pycocotools/maskApi.c:
--------------------------------------------------------------------------------
  1 | /**************************************************************************
  2 | * Microsoft COCO Toolbox.      version 2.0
  3 | * Data, paper, and tutorials available at:  http://mscoco.org/
  4 | * Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
  5 | * Licensed under the Simplified BSD License [see coco/license.txt]
  6 | **************************************************************************/
  7 | #include "maskApi.h"
  8 | #include <math.h>
  9 | #include <stdlib.h>
 10 | 
 11 | uint umin( uint a, uint b ) { return (a<b) ? a : b; }
 12 | uint umax( uint a, uint b ) { return (a>b) ? a : b; }
 13 | 
 14 | void rleInit( RLE *R, siz h, siz w, siz m, uint *cnts ) {
 15 |   R->h=h; R->w=w; R->m=m; R->cnts=(m==0)?0:malloc(sizeof(uint)*m);
 16 |   siz j; if(cnts) for(j=0; j<m; j++) R->cnts[j]=cnts[j];
 17 | }
 18 | 
 19 | void rleFree( RLE *R ) {
 20 |   free(R->cnts); R->cnts=0;
 21 | }
 22 | 
 23 | void rlesInit( RLE **R, siz n ) {
 24 |   siz i; *R = (RLE*) malloc(sizeof(RLE)*n);
 25 |   for(i=0; i<n; i++) rleInit((*R)+i,0,0,0,0);
 26 | }
 27 | 
 28 | void rlesFree( RLE **R, siz n ) {
 29 |   siz i; for(i=0; i<n; i++) rleFree((*R)+i); free(*R); *R=0;
 30 | }
 31 | 
 32 | void rleEncode( RLE *R, const byte *M, siz h, siz w, siz n ) {
 33 |   siz i, j, k, a=w*h; uint c, *cnts; byte p;
 34 |   cnts = malloc(sizeof(uint)*(a+1));
 35 |   for(i=0; i<n; i++) {
 36 |     const byte *T=M+a*i; k=0; p=0; c=0;
 37 |     for(j=0; j<a; j++) { if(T[j]!=p) { cnts[k++]=c; c=0; p=T[j]; } c++; }
 38 |     cnts[k++]=c; rleInit(R+i,h,w,k,cnts);
 39 |   }
 40 |   free(cnts);
 41 | }
 42 | 
 43 | void rleDecode( const RLE *R, byte *M, siz n ) {
 44 |   siz i, j, k; for( i=0; i<n; i++ ) {
 45 |     byte v=0; for( j=0; j<R[i].m; j++ ) {
 46 |       for( k=0; k<R[i].cnts[j]; k++ ) *(M++)=v; v=!v; }}
 47 | }
 48 | 
 49 | void rleMerge( const RLE *R, RLE *M, siz n, int intersect ) {
 50 |   uint *cnts, c, ca, cb, cc, ct; int v, va, vb, vp;
 51 |   siz i, a, b, h=R[0].h, w=R[0].w, m=R[0].m; RLE A, B;
 52 |   if(n==0) { rleInit(M,0,0,0,0); return; }
 53 |   if(n==1) { rleInit(M,h,w,m,R[0].cnts); return; }
 54 |   cnts = malloc(sizeof(uint)*(h*w+1));
 55 |   for( a=0; a<m; a++ ) cnts[a]=R[0].cnts[a];
 56 |   for( i=1; i<n; i++ ) {
 57 |     B=R[i]; if(B.h!=h||B.w!=w) { h=w=m=0; break; }
 58 |     rleInit(&A,h,w,m,cnts); ca=A.cnts[0]; cb=B.cnts[0];
 59 |     v=va=vb=0; m=0; a=b=1; cc=0; ct=1;
 60 |     while( ct>0 ) {
 61 |       c=umin(ca,cb); cc+=c; ct=0;
 62 |       ca-=c; if(!ca && a<A.m) { ca=A.cnts[a++]; va=!va; } ct+=ca;
 63 |       cb-=c; if(!cb && b<B.m) { cb=B.cnts[b++]; vb=!vb; } ct+=cb;
 64 |       vp=v; if(intersect) v=va&&vb; else v=va||vb;
 65 |       if( v!=vp||ct==0 ) { cnts[m++]=cc; cc=0; }
 66 |     }
 67 |     rleFree(&A);
 68 |   }
 69 |   rleInit(M,h,w,m,cnts); free(cnts);
 70 | }
 71 | 
 72 | void rleArea( const RLE *R, siz n, uint *a ) {
 73 |   siz i, j; for( i=0; i<n; i++ ) {
 74 |     a[i]=0; for( j=1; j<R[i].m; j+=2 ) a[i]+=R[i].cnts[j]; }
 75 | }
 76 | 
 77 | void rleIou( RLE *dt, RLE *gt, siz m, siz n, byte *iscrowd, double *o ) {
 78 |   siz g, d; BB db, gb; int crowd;
 79 |   db=malloc(sizeof(double)*m*4); rleToBbox(dt,db,m);
 80 |   gb=malloc(sizeof(double)*n*4); rleToBbox(gt,gb,n);
 81 |   bbIou(db,gb,m,n,iscrowd,o); free(db); free(gb);
 82 |   for( g=0; g<n; g++ ) for( d=0; d<m; d++ ) if(o[g*m+d]>0) {
 83 |     crowd=iscrowd!=NULL && iscrowd[g];
 84 |     if(dt[d].h!=gt[g].h || dt[d].w!=gt[g].w) { o[g*m+d]=-1; continue; }
 85 |     siz ka, kb, a, b; uint c, ca, cb, ct, i, u; int va, vb;
 86 |     ca=dt[d].cnts[0]; ka=dt[d].m; va=vb=0;
 87 |     cb=gt[g].cnts[0]; kb=gt[g].m; a=b=1; i=u=0; ct=1;
 88 |     while( ct>0 ) {
 89 |       c=umin(ca,cb); if(va||vb) { u+=c; if(va&&vb) i+=c; } ct=0;
 90 |       ca-=c; if(!ca && a<ka) { ca=dt[d].cnts[a++]; va=!va; } ct+=ca;
 91 |       cb-=c; if(!cb && b<kb) { cb=gt[g].cnts[b++]; vb=!vb; } ct+=cb;
 92 |     }
 93 |     if(i==0) u=1; else if(crowd) rleArea(dt+d,1,&u);
 94 |     o[g*m+d] = (double)i/(double)u;
 95 |   }
 96 | }
 97 | 
 98 | void rleNms( RLE *dt, siz n, uint *keep, double thr ) {
 99 |   siz i, j; double u;
100 |   for( i=0; i<n; i++ ) keep[i]=1;
101 |   for( i=0; i<n; i++ ) if(keep[i]) {
102 |     for( j=i+1; j<n; j++ ) if(keep[j]) {
103 |       rleIou(dt+i,dt+j,1,1,0,&u);
104 |       if(u>thr) keep[j]=0;
105 |     }
106 |   }
107 | }
108 | 
109 | void bbIou( BB dt, BB gt, siz m, siz n, byte *iscrowd, double *o ) {
110 |   double h, w, i, u, ga, da; siz g, d; int crowd;
111 |   for( g=0; g<n; g++ ) {
112 |     BB G=gt+g*4; ga=G[2]*G[3]; crowd=iscrowd!=NULL && iscrowd[g];
113 |     for( d=0; d<m; d++ ) {
114 |       BB D=dt+d*4; da=D[2]*D[3]; o[g*m+d]=0;
115 |       w=fmin(D[2]+D[0],G[2]+G[0])-fmax(D[0],G[0]); if(w<=0) continue;
116 |       h=fmin(D[3]+D[1],G[3]+G[1])-fmax(D[1],G[1]); if(h<=0) continue;
117 |       i=w*h; u = crowd ? da : da+ga-i; o[g*m+d]=i/u;
118 |     }
119 |   }
120 | }
121 | 
122 | void bbNms( BB dt, siz n, uint *keep, double thr ) {
123 |   siz i, j; double u;
124 |   for( i=0; i<n; i++ ) keep[i]=1;
125 |   for( i=0; i<n; i++ ) if(keep[i]) {
126 |     for( j=i+1; j<n; j++ ) if(keep[j]) {
127 |       bbIou(dt+i*4,dt+j*4,1,1,0,&u);
128 |       if(u>thr) keep[j]=0;
129 |     }
130 |   }
131 | }
132 | 
133 | void rleToBbox( const RLE *R, BB bb, siz n ) {
134 |   siz i; for( i=0; i<n; i++ ) {
135 |     uint h, w, x, y, xs, ys, xe, ye, cc, t; siz j, m;
136 |     h=(uint)R[i].h; w=(uint)R[i].w; m=R[i].m;
137 |     m=((siz)(m/2))*2; xs=w; ys=h; xe=ye=0; cc=0;
138 |     if(m==0) { bb[4*i+0]=bb[4*i+1]=bb[4*i+2]=bb[4*i+3]=0; continue; }
139 |     for( j=0; j<m; j++ ) {
140 |       cc+=R[i].cnts[j]; t=cc-j%2; y=t%h; x=(t-y)/h;
141 |       xs=umin(xs,x); xe=umax(xe,x); ys=umin(ys,y); ye=umax(ye,y);
142 |     }
143 |     bb[4*i+0]=xs; bb[4*i+2]=xe-xs+1;
144 |     bb[4*i+1]=ys; bb[4*i+3]=ye-ys+1;
145 |   }
146 | }
147 | 
148 | void rleFrBbox( RLE *R, const BB bb, siz h, siz w, siz n ) {
149 |   siz i; for( i=0; i<n; i++ ) {
150 |     double xs=bb[4*i+0], xe=xs+bb[4*i+2];
151 |     double ys=bb[4*i+1], ye=ys+bb[4*i+3];
152 |     double xy[8] = {xs,ys,xs,ye,xe,ye,xe,ys};
153 |     rleFrPoly( R+i, xy, 4, h, w );
154 |   }
155 | }
156 | 
157 | int uintCompare(const void *a, const void *b) {
158 |   uint c=*((uint*)a), d=*((uint*)b); return c>d?1:c<d?-1:0;
159 | }
160 | 
161 | void rleFrPoly( RLE *R, const double *xy, siz k, siz h, siz w ) {
162 |   /* upsample and get discrete points densely along entire boundary */
163 |   siz j, m=0; double scale=5; int *x, *y, *u, *v; uint *a, *b;
164 |   x=malloc(sizeof(int)*(k+1)); y=malloc(sizeof(int)*(k+1));
165 |   for(j=0; j<k; j++) x[j]=(int)(scale*xy[j*2+0]+.5); x[k]=x[0];
166 |   for(j=0; j<k; j++) y[j]=(int)(scale*xy[j*2+1]+.5); y[k]=y[0];
167 |   for(j=0; j<k; j++) m+=umax(abs(x[j]-x[j+1]),abs(y[j]-y[j+1]))+1;
168 |   u=malloc(sizeof(int)*m); v=malloc(sizeof(int)*m); m=0;
169 |   for( j=0; j<k; j++ ) {
170 |     int xs=x[j], xe=x[j+1], ys=y[j], ye=y[j+1], dx, dy, t, d;
171 |     int flip; double s; dx=abs(xe-xs); dy=abs(ys-ye);
172 |     flip = (dx>=dy && xs>xe) || (dx<dy && ys>ye);
173 |     if(flip) { t=xs; xs=xe; xe=t; t=ys; ys=ye; ye=t; }
174 |     s = dx>=dy ? (double)(ye-ys)/dx : (double)(xe-xs)/dy;
175 |     if(dx>=dy) for( d=0; d<=dx; d++ ) {
176 |       t=flip?dx-d:d; u[m]=t+xs; v[m]=(int)(ys+s*t+.5); m++;
177 |     } else for( d=0; d<=dy; d++ ) {
178 |       t=flip?dy-d:d; v[m]=t+ys; u[m]=(int)(xs+s*t+.5); m++;
179 |     }
180 |   }
181 |   /* get points along y-boundary and downsample */
182 |   free(x); free(y); k=m; m=0; double xd, yd;
183 |   x=malloc(sizeof(int)*k); y=malloc(sizeof(int)*k);
184 |   for( j=1; j<k; j++ ) if(u[j]!=u[j-1]) {
185 |     xd=(double)(u[j]<u[j-1]?u[j]:u[j]-1); xd=(xd+.5)/scale-.5;
186 |     if( floor(xd)!=xd || xd<0 || xd>w-1 ) continue;
187 |     yd=(double)(v[j]<v[j-1]?v[j]:v[j-1]); yd=(yd+.5)/scale-.5;
188 |     if(yd<0) yd=0; else if(yd>h) yd=h; yd=ceil(yd);
189 |     x[m]=(int) xd; y[m]=(int) yd; m++;
190 |   }
191 |   /* compute rle encoding given y-boundary points */
192 |   k=m; a=malloc(sizeof(uint)*(k+1));
193 |   for( j=0; j<k; j++ ) a[j]=(uint)(x[j]*(int)(h)+y[j]);
194 |   a[k++]=(uint)(h*w); free(u); free(v); free(x); free(y);
195 |   qsort(a,k,sizeof(uint),uintCompare); uint p=0;
196 |   for( j=0; j<k; j++ ) { uint t=a[j]; a[j]-=p; p=t; }
197 |   b=malloc(sizeof(uint)*k); j=m=0; b[m++]=a[j++];
198 |   while(j<k) if(a[j]>0) b[m++]=a[j++]; else {
199 |     j++; if(j<k) b[m-1]+=a[j++]; }
200 |   rleInit(R,h,w,m,b); free(a); free(b);
201 | }
202 | 
203 | char* rleToString( const RLE *R ) {
204 |   /* Similar to LEB128 but using 6 bits/char and ascii chars 48-111. */
205 |   siz i, m=R->m, p=0; long x; int more;
206 |   char *s=malloc(sizeof(char)*m*6);
207 |   for( i=0; i<m; i++ ) {
208 |     x=(long) R->cnts[i]; if(i>2) x-=(long) R->cnts[i-2]; more=1;
209 |     while( more ) {
210 |       char c=x & 0x1f; x >>= 5; more=(c & 0x10) ? x!=-1 : x!=0;
211 |       if(more) c |= 0x20; c+=48; s[p++]=c;
212 |     }
213 |   }
214 |   s[p]=0; return s;
215 | }
216 | 
217 | void rleFrString( RLE *R, char *s, siz h, siz w ) {
218 |   siz m=0, p=0, k; long x; int more; uint *cnts;
219 |   while( s[m] ) m++; cnts=malloc(sizeof(uint)*m); m=0;
220 |   while( s[p] ) {
221 |     x=0; k=0; more=1;
222 |     while( more ) {
223 |       char c=s[p]-48; x |= (c & 0x1f) << 5*k;
224 |       more = c & 0x20; p++; k++;
225 |       if(!more && (c & 0x10)) x |= -1 << 5*k;
226 |     }
227 |     if(m>2) x+=(long) cnts[m-2]; cnts[m++]=(uint) x;
228 |   }
229 |   rleInit(R,h,w,m,cnts); free(cnts);
230 | }
231 | 


--------------------------------------------------------------------------------
/utils/pycocotools/maskApi.h:
--------------------------------------------------------------------------------
 1 | /**************************************************************************
 2 | * Microsoft COCO Toolbox.      version 2.0
 3 | * Data, paper, and tutorials available at:  http://mscoco.org/
 4 | * Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
 5 | * Licensed under the Simplified BSD License [see coco/license.txt]
 6 | **************************************************************************/
 7 | #pragma once
 8 | 
 9 | typedef unsigned int uint;
10 | typedef unsigned long siz;
11 | typedef unsigned char byte;
12 | typedef double* BB;
13 | typedef struct { siz h, w, m; uint *cnts; } RLE;
14 | 
15 | /* Initialize/destroy RLE. */
16 | void rleInit( RLE *R, siz h, siz w, siz m, uint *cnts );
17 | void rleFree( RLE *R );
18 | 
19 | /* Initialize/destroy RLE array. */
20 | void rlesInit( RLE **R, siz n );
21 | void rlesFree( RLE **R, siz n );
22 | 
23 | /* Encode binary masks using RLE. */
24 | void rleEncode( RLE *R, const byte *mask, siz h, siz w, siz n );
25 | 
26 | /* Decode binary masks encoded via RLE. */
27 | void rleDecode( const RLE *R, byte *mask, siz n );
28 | 
29 | /* Compute union or intersection of encoded masks. */
30 | void rleMerge( const RLE *R, RLE *M, siz n, int intersect );
31 | 
32 | /* Compute area of encoded masks. */
33 | void rleArea( const RLE *R, siz n, uint *a );
34 | 
35 | /* Compute intersection over union between masks. */
36 | void rleIou( RLE *dt, RLE *gt, siz m, siz n, byte *iscrowd, double *o );
37 | 
38 | /* Compute non-maximum suppression between bounding masks */
39 | void rleNms( RLE *dt, siz n, uint *keep, double thr );
40 | 
41 | /* Compute intersection over union between bounding boxes. */
42 | void bbIou( BB dt, BB gt, siz m, siz n, byte *iscrowd, double *o );
43 | 
44 | /* Compute non-maximum suppression between bounding boxes */
45 | void bbNms( BB dt, siz n, uint *keep, double thr );
46 | 
47 | /* Get bounding boxes surrounding encoded masks. */
48 | void rleToBbox( const RLE *R, BB bb, siz n );
49 | 
50 | /* Convert bounding boxes to encoded masks. */
51 | void rleFrBbox( RLE *R, const BB bb, siz h, siz w, siz n );
52 | 
53 | /* Convert polygon to encoded mask. */
54 | void rleFrPoly( RLE *R, const double *xy, siz k, siz h, siz w );
55 | 
56 | /* Get compressed string representation of encoded mask. */
57 | char* rleToString( const RLE *R );
58 | 
59 | /* Convert from compressed string representation of encoded mask. */
60 | void rleFrString( RLE *R, char *s, siz h, siz w );
61 | 


--------------------------------------------------------------------------------
/utils/timer.py:
--------------------------------------------------------------------------------
 1 | # --------------------------------------------------------
 2 | # Fast R-CNN
 3 | # Copyright (c) 2015 Microsoft
 4 | # Licensed under The MIT License [see LICENSE for details]
 5 | # Written by Ross Girshick
 6 | # --------------------------------------------------------
 7 | 
 8 | import time
 9 | 
10 | 
11 | class Timer(object):
12 |     """A simple timer."""
13 |     def __init__(self):
14 |         self.total_time = 0.
15 |         self.calls = 0
16 |         self.start_time = 0.
17 |         self.diff = 0.
18 |         self.average_time = 0.
19 | 
20 |     def tic(self):
21 |         # using time.time instead of time.clock because time time.clock
22 |         # does not normalize for multithreading
23 |         self.start_time = time.time()
24 | 
25 |     def toc(self, average=True):
26 |         self.diff = time.time() - self.start_time
27 |         self.total_time += self.diff
28 |         self.calls += 1
29 |         self.average_time = self.total_time / self.calls
30 |         if average:
31 |             return self.average_time
32 |         else:
33 |             return self.diff
34 | 
35 |     def clear(self):
36 |         self.total_time = 0.
37 |         self.calls = 0
38 |         self.start_time = 0.
39 |         self.diff = 0.
40 |         self.average_time = 0.
41 | 


--------------------------------------------------------------------------------