├── .gitignore
├── LICENSE
├── README.md
├── convert_darknet.py
├── data
├── coco.py
├── config.py
├── data_augment.py
├── voc0712.py
└── voc_eval.py
├── demo.py
├── eval.py
├── images
├── dog.jpg
├── eagle.jpg
└── person.jpg
├── layers
├── multiyolo_loss.py
├── weight_mseloss.py
├── yolo_layer.py
└── yolo_loss.py
├── make.sh
├── model
├── __init__.py
├── darknet53.py
└── yolo.py
├── output
├── output_dog.jpg
├── output_eagle.jpg
└── output_person.jpg
├── train.py
└── utils
├── box_utils.py
├── build.py
├── gen_anchors.py
├── nms
├── __init__.py
├── cpu_nms.pyx
├── gpu_nms.hpp
├── gpu_nms.pyx
├── nms_kernel.cu
└── py_cpu_nms.py
├── nms_wrapper.py
├── preprocess.py
├── pycocotools
├── __init__.py
├── _mask.pyx
├── coco.py
├── cocoeval.py
├── mask.py
├── maskApi.c
└── maskApi.h
└── timer.py
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | env/
12 | build/
13 | develop-eggs/
14 | dist/
15 | downloads/
16 | eggs/
17 | .eggs/
18 | lib/
19 | lib64/
20 | parts/
21 | sdist/
22 | var/
23 | *.egg-info/
24 | .installed.cfg
25 | *.egg
26 |
27 | *log
28 | *.json
29 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2017 Max deGroot, Ellis Brown
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | ## YOLO v3 implementation With pytorch
2 | > this repository only contain the detection module and we don't need the cfg from original YOLOv3, we implement it with pytorch.
3 |
4 | This repository is based on the official code of [YOLOv3](https://github.com/pjreddie/darknet) and [pytorch-yolo-v3](https://github.com/ayooshkathuria/pytorch-yolo-v3). There's also an implementation for YOLOv3 already for pytorch, but it uses a config file rather than a normal pytorch approch to defining the network. One of the goals of this repository is to remove the cfg file.
5 |
6 | ## Requirements
7 |
8 | * Python 3.5
9 | * OpenCV
10 | * PyTorch 0.4
11 |
12 | ## Installation
13 |
14 | * Install PyTorch-0.4.0 by selecting your environment on the website and running the appropriate command.
15 | * Clone this repository
16 | * Compile the nms
17 | * convert yolov3.weights to pytorch
18 |
19 | ```shell
20 | cd YOLOv3_Pytorch
21 | ./make.sh
22 |
23 | mkdir weights
24 | cd weights
25 | wget https://pjreddie.com/media/files/yolov3.weights
26 | cd ..
27 | python convert_darknet.py --version coco --weights ./weights/yolov3.weights --save_name ./weights/convert_yolov3_coco.pth
28 | # we will get the convert_yolov3_coco.pth
29 | ```
30 |
31 | ## Train
32 | > We only train voc dataset because we don't have enough gpus to train coco datatset. This is still an experimental repository, we don't reproduce the original results very well.
33 |
34 | ### dataset
35 | [merge VOC dataset](https://github.com/yqyao/DRFNet#voc-dataset)
36 |
37 | * structure
38 |
39 | ./data/datasets/VOCdevkit0712/VOC0712/Annotations
40 | ./data/datasets/VOCdevkit0712/VOC0712/ImageSets
41 | ./data/datasets/VOCdevkit0712/VOC0712/JPEGImages
42 |
43 | * COCO
44 |
45 | Same with [COCO](https://github.com/yqyao/DRFNet#coco-dataset)
46 |
47 | ### train
48 | > you can train multiscale by changing data/config voc_config multiscale
49 |
50 | * convert weights
51 | ```shell
52 | cd weights
53 | wget wget https://pjreddie.com/media/files/darknet53.conv.74
54 | cd ../
55 | python convert_darknet.py --version darknet53 --weights ./weights/darknet53.conv.74 --save_name ./weights/convert_darknet53.pth
56 | ```
57 |
58 | * train yolov3
59 |
60 | ```python
61 | python train.py --input_wh 416 416 -b 64 --subdivisions 4 -d VOC --basenet ./weights/convert_darknet53.pth
62 |
63 | ```
64 |
65 | ### eval
66 |
67 | ```python
68 |
69 | python eval.py --weights ./weights/convert_yolov3_voc.pth --dataset VOC --input_wh 416 416
70 | ```
71 | > darknet voc is trained by darknet, pytorch voc is trained by this repository
72 |
73 | **results**
74 |
75 | | darknet voc 608 | darknet voc 416 | pytorch voc 608| pytorch voc 416|
76 | |:-: |:-: | :-: |:-: |
77 | | 77.2 % | 76.2% | 74.7% | 74.9% |
78 | | 27ms | 18ms | 27ms | 18ms |
79 |
80 | ## Demo
81 |
82 | ```python
83 |
84 | python demo.py --images images --save_path ./output --weights ./weights/convert_yolov3_coco.pth -d COCO
85 |
86 | ```
87 |
88 | ## Example
89 |
90 |
91 |
92 |
93 | ## References
94 | - [YOLOv3: An Incremental Improvemet](https://pjreddie.com/media/files/papers/YOLOv3.pdf)
95 |
96 | - [Original Implementation (Darknet)](https://github.com/pjreddie/darknet)
97 |
98 | - [pytorch-yolo-v3](https://github.com/ayooshkathuria/pytorch-yolo-v3)
99 |
100 | - [pytorch-yolo2](https://github.com/marvis/pytorch-yolo2)
101 |
--------------------------------------------------------------------------------
/convert_darknet.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | # Written by yq_yao
3 | #
4 | import torch
5 | import torch.nn as nn
6 | import torch.nn.functional as F
7 | from torch.autograd import Variable
8 | import numpy as np
9 | from data.config import voc_config, coco_config
10 | from model.yolo import Yolov3
11 | from model.darknet53 import Darknet53
12 | import argparse
13 | import os
14 |
15 | def copy_weights(bn, conv, ptr, weights, use_bn=True):
16 | if use_bn:
17 | num_bn_biases = bn.bias.numel()
18 |
19 | #Load the weights
20 | bn_biases = torch.from_numpy(weights[ptr:ptr + num_bn_biases])
21 | ptr += num_bn_biases
22 |
23 | bn_weights = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
24 | ptr += num_bn_biases
25 |
26 | bn_running_mean = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
27 | ptr += num_bn_biases
28 |
29 | bn_running_var = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
30 | ptr += num_bn_biases
31 |
32 | #Cast the loaded weights into dims of model weights.
33 | bn_biases = bn_biases.view_as(bn.bias.data)
34 | bn_weights = bn_weights.view_as(bn.weight.data)
35 | bn_running_mean = bn_running_mean.view_as(bn.running_mean)
36 | bn_running_var = bn_running_var.view_as(bn.running_var)
37 |
38 | #Copy the data to model
39 | bn.bias.data.copy_(bn_biases)
40 | bn.weight.data.copy_(bn_weights)
41 | bn.running_mean.copy_(bn_running_mean)
42 | bn.running_var.copy_(bn_running_var)
43 | else:
44 | #Number of biases
45 | num_biases = conv.bias.numel()
46 |
47 | #Load the weights
48 | conv_biases = torch.from_numpy(weights[ptr: ptr + num_biases])
49 | ptr = ptr + num_biases
50 |
51 | #reshape the loaded weights according to the dims of the model weights
52 | conv_biases = conv_biases.view_as(conv.bias.data)
53 |
54 | #Finally copy the data
55 | conv.bias.data.copy_(conv_biases)
56 |
57 | #Let us load the weights for the Convolutional layers
58 | num_weights = conv.weight.numel()
59 | conv_weights = torch.from_numpy(weights[ptr:ptr+num_weights])
60 | ptr = ptr + num_weights
61 |
62 | conv_weights = conv_weights.view_as(conv.weight.data)
63 | conv.weight.data.copy_(conv_weights)
64 | return ptr
65 |
66 | def load_weights_darknet53(weightfile, yolov3):
67 | fp = open(weightfile, "rb")
68 | #The first 5 values are header information
69 | # 1. Major version number
70 | # 2. Minor Version Number
71 | # 3. Subversion number
72 | # 4. IMages seen
73 | header = np.fromfile(fp, dtype = np.int32, count = 5)
74 | weights = np.fromfile(fp, dtype = np.float32)
75 | print(len(weights))
76 | ptr = 0
77 | first_conv = yolov3.conv
78 | bn = first_conv.bn
79 | conv = first_conv.conv
80 | # first conv copy
81 | ptr = copy_weights(bn, conv, ptr, weights)
82 |
83 | layers = [yolov3.layer1, yolov3.layer2, yolov3.layer3, yolov3.layer4, yolov3.layer5]
84 | for layer in layers:
85 | for i in range(len(layer)):
86 | if i == 0:
87 | bn = layer[i].bn
88 | conv = layer[i].conv
89 | ptr = copy_weights(bn, conv, ptr, weights)
90 | else:
91 | bn = layer[i].conv1.bn
92 | conv = layer[i].conv1.conv
93 | ptr = copy_weights(bn, conv, ptr, weights)
94 | bn = layer[i].conv2.bn
95 | conv = layer[i].conv2.conv
96 | ptr = copy_weights(bn, conv, ptr, weights)
97 | print(ptr)
98 | fp.close()
99 |
100 | def load_weights(weightfile, yolov3, version):
101 | if version == "voc" or version == "coco":
102 | load_weights_yolov3(weightfile, yolov3)
103 | elif version == "darknet53":
104 | load_weights_darknet53(weightfile, yolov3)
105 |
106 | def load_weights_yolov3(weightfile, yolov3):
107 | fp = open(weightfile, "rb")
108 | #The first 5 values are header information
109 | # 1. Major version number
110 | # 2. Minor Version Number
111 | # 3. Subversion number
112 | # 4, 5. IMages seen
113 | header = np.fromfile(fp, dtype = np.int32, count = 5)
114 | weights = np.fromfile(fp, dtype = np.float32)
115 | print(len(weights))
116 | ptr = 0
117 | extractor = yolov3.extractor
118 | first_conv = extractor.conv
119 | bn = first_conv.bn
120 | conv = first_conv.conv
121 | # first conv copy
122 | ptr = copy_weights(bn, conv, ptr, weights)
123 |
124 | layers = [extractor.layer1, extractor.layer2, extractor.layer3, extractor.layer4, extractor.layer5]
125 | for layer in layers:
126 | for i in range(len(layer)):
127 | if i == 0:
128 | bn = layer[i].bn
129 | conv = layer[i].conv
130 | ptr = copy_weights(bn, conv, ptr, weights)
131 | else:
132 | bn = layer[i].conv1.bn
133 | conv = layer[i].conv1.conv
134 | ptr = copy_weights(bn, conv, ptr, weights)
135 | bn = layer[i].conv2.bn
136 | conv = layer[i].conv2.conv
137 | ptr = copy_weights(bn, conv, ptr, weights)
138 | predict_conv_list1 = yolov3.predict_conv_list1
139 | smooth_conv1 = yolov3.smooth_conv1
140 | predict_conv_list2 = yolov3.predict_conv_list2
141 | smooth_conv2 = yolov3.smooth_conv2
142 | predict_conv_list3 = yolov3.predict_conv_list3
143 | for i in range(len(predict_conv_list1)):
144 | if i == 6:
145 | bn = 0
146 | conv = predict_conv_list1[i]
147 | ptr = copy_weights(bn, conv, ptr, weights, use_bn=False)
148 | else:
149 | bn = predict_conv_list1[i].bn
150 | conv = predict_conv_list1[i].conv
151 | ptr = copy_weights(bn, conv, ptr, weights)
152 | bn = smooth_conv1.bn
153 | conv = smooth_conv1.conv
154 | ptr = copy_weights(bn, conv, ptr, weights)
155 | for i in range(len(predict_conv_list2)):
156 | if i == 6:
157 | bn = 0
158 | conv = predict_conv_list2[i]
159 | ptr = copy_weights(bn, conv, ptr, weights, use_bn=False)
160 | else:
161 | bn = predict_conv_list2[i].bn
162 | conv = predict_conv_list2[i].conv
163 | ptr = copy_weights(bn, conv, ptr, weights)
164 | bn = smooth_conv2.bn
165 | conv = smooth_conv2.conv
166 | ptr = copy_weights(bn, conv, ptr, weights)
167 |
168 | for i in range(len(predict_conv_list3)):
169 | if i == 6:
170 | bn = 0
171 | conv = predict_conv_list3[i]
172 | ptr = copy_weights(bn, conv, ptr, weights, use_bn=False)
173 | else:
174 | bn = predict_conv_list3[i].bn
175 | conv = predict_conv_list3[i].conv
176 | ptr = copy_weights(bn, conv, ptr, weights)
177 | print(ptr)
178 | fp.close()
179 |
180 |
181 | def arg_parse():
182 | """
183 | Parse arguments to the train module
184 | """
185 | parser = argparse.ArgumentParser(
186 | description='Yolov3 pytorch Training')
187 | parser.add_argument('--input_wh', default=(416, 416),
188 | help='input size.')
189 | parser.add_argument('--version', '--version', default='coco',
190 | help='voc, coco, darknet53')
191 | parser.add_argument('--weights', default='./weights/yolov3.weights', help='pretrained base model')
192 | parser.add_argument('--save_name', default='./weights/convert_yolov3_coco.pth', help='save name')
193 |
194 | return parser.parse_args()
195 |
196 | def load_weights_darknet19(weightfile, darknet19):
197 | fp = open(weightfile, "rb")
198 | #The first 4 values are header information
199 | # 1. Major version number
200 | # 2. Minor Version Number
201 | # 3. Subversion number
202 | # 4. IMages seen
203 | header = np.fromfile(fp, dtype = np.int32, count=4)
204 | weights = np.fromfile(fp, dtype = np.float32)
205 | ptr = 0
206 | first_conv = darknet19.conv
207 | bn = first_conv.bn
208 | conv = first_conv.conv
209 | # first conv copy
210 | ptr = copy_weights(bn, conv, ptr, weights)
211 | layers = [darknet19.layer1, darknet19.layer2, darknet19.layer3, darknet19.layer4, darknet19.layer5]
212 | for layer in layers:
213 | for i in range(len(layer)):
214 | if i == 0:
215 | pass
216 | else:
217 | bn = layer[i].bn
218 | conv = layer[i].conv
219 | ptr = copy_weights(bn, conv, ptr, weights)
220 | fp.close()
221 |
222 | if __name__ == '__main__':
223 | args = arg_parse()
224 | weightfile = args.weights
225 | input_wh = args.input_wh
226 | version = args.version
227 | save_name = args.save_name
228 | if version == "voc":
229 | cfg = voc_config
230 | yolov3 = Yolov3("train", input_wh, cfg["anchors"], cfg["anchors_mask"], cfg["num_classes"])
231 | elif version == "coco":
232 | cfg = coco_config
233 | yolov3 = Yolov3("train", input_wh, cfg["anchors"], cfg["anchors_mask"], cfg["num_classes"])
234 | elif version == "darknet53":
235 | cfg = voc_config
236 | num_blocks = [1,2,8,8,4]
237 | yolov3 = Darknet53(num_blocks)
238 | else:
239 | print("Unkown version !!!")
240 | import sys
241 | sys.exit()
242 |
243 | load_weights(weightfile, yolov3, version)
244 | # name = "convert_yolo_" + version + ".pth"
245 | # save_path = os.path.join("./weights", name)
246 | torch.save(yolov3.state_dict(), save_name)
247 |
248 |
249 |
250 |
--------------------------------------------------------------------------------
/data/coco.py:
--------------------------------------------------------------------------------
1 | """VOC Dataset Classes
2 |
3 | Original author: Francisco Massa
4 | https://github.com/fmassa/vision/blob/voc_dataset/torchvision/datasets/voc.py
5 |
6 | Updated by: Ellis Brown, Max deGroot
7 | """
8 |
9 | import os
10 | import pickle
11 | import os.path
12 | import sys
13 | import torch
14 | import torch.utils.data as data
15 | import torchvision.transforms as transforms
16 | import cv2
17 | import numpy as np
18 | import json
19 | import uuid
20 | from .data_augment import preproc
21 |
22 | from pycocotools.coco import COCO
23 | from pycocotools.cocoeval import COCOeval
24 | from pycocotools import mask as COCOmask
25 |
26 | class COCOAnnotationTransform(object):
27 | """Transforms a COCO annotation into a Tensor of bbox coords and label index
28 | Initilized with a dictionary lookup of classnames to indexes
29 |
30 | Arguments:
31 | class_to_ind (dict, optional): dictionary lookup of classnames -> indexes
32 | (default: alphabetic indexing of VOC's 20 classes)
33 | keep_difficult (bool, optional): keep difficult instances or not
34 | (default: False)
35 | height (int): height
36 | width (int): width
37 | """
38 |
39 | def __init__(self):
40 | pass
41 |
42 | def __call__(self, target, width, height):
43 | """
44 | Arguments:
45 | target (annotation) : the target annotation to be made usable
46 | will be not normlized
47 | Returns:
48 | a list containing lists of bounding boxes [bbox coords, class name]
49 | """
50 |
51 | boxes = target[:,:-1].copy()
52 | labels = target[:,-1].copy()
53 | boxes[:, 0::2] /= width
54 | boxes[:, 1::2] /= height
55 | b_w = (boxes[:, 2] - boxes[:, 0])*1.
56 | b_h = (boxes[:, 3] - boxes[:, 1])*1.
57 | mask_b= np.minimum(b_w, b_h) > 0.01
58 | boxes_t = boxes[mask_b]
59 | labels_t = labels[mask_b].copy()
60 |
61 | return boxes_t, labels_t
62 |
63 |
64 | class COCODetection(data.Dataset):
65 |
66 | """VOC Detection Dataset Object
67 |
68 | input is image, target is annotation
69 |
70 | Arguments:
71 | root (string): filepath to VOCdevkit folder.
72 | image_set (string): imageset to use (eg. 'train', 'val', 'test')
73 | transform (callable, optional): transformation to perform on the
74 | input image
75 | target_transform (callable, optional): transformation to perform on the
76 | target `annotation`
77 | (eg: take in caption string, return tensor of word indices)
78 | dataset_name (string, optional): which dataset to load
79 | (default: 'VOC2007')
80 | """
81 |
82 | def __init__(self, root, image_sets, resize_wh, batch_size, multiscale=False, dataset_name='COCO'):
83 | self.root = root
84 | self.cache_path = os.path.join(self.root, 'cache')
85 | self.image_set = image_sets
86 | self.name = dataset_name
87 | self.resize_wh = resize_wh
88 | self.batch_size = batch_size
89 | self.multiscale = multiscale
90 | self.transform = preproc()
91 | self.ids = list()
92 | self.annotations = list()
93 | self._view_map = {
94 | 'minival2014' : 'val2014', # 5k val2014 subset
95 | 'valminusminival2014' : 'val2014', # val2014 \setminus minival2014
96 | 'test-dev2015' : 'test2015',
97 | }
98 |
99 | for (year, image_set) in image_sets:
100 | coco_name = image_set+year
101 | data_name = (self._view_map[coco_name]
102 | if coco_name in self._view_map
103 | else coco_name)
104 | annofile = self._get_ann_file(coco_name)
105 | _COCO = COCO(annofile)
106 | self._COCO = _COCO
107 | self.coco_name = coco_name
108 | cats = _COCO.loadCats(_COCO.getCatIds())
109 | self._classes = tuple([c['name'] for c in cats])
110 | self.num_classes = len(self._classes)
111 | self._class_to_ind = dict(zip(self._classes, range(self.num_classes)))
112 | self._class_to_coco_cat_id = dict(zip([c['name'] for c in cats],
113 | _COCO.getCatIds()))
114 | indexes = _COCO.getImgIds()
115 | self.image_indexes = indexes
116 | self.ids.extend([self.image_path_from_index(data_name, index) for index in indexes ])
117 | if image_set.find('test') != -1:
118 | print('test set will not load annotations!')
119 | else:
120 | self.annotations.extend(self._load_coco_annotations(coco_name, indexes,_COCO))
121 |
122 |
123 |
124 | def image_path_from_index(self, name, index):
125 | """
126 | Construct an image path from the image's "index" identifier.
127 | """
128 | # Example image path for index=119993:
129 | # images/train2014/COCO_train2014_000000119993.jpg
130 | file_name = ('COCO_' + name + '_' +
131 | str(index).zfill(12) + '.jpg')
132 | image_path = os.path.join(self.root, 'images',
133 | name, file_name)
134 | assert os.path.exists(image_path), \
135 | 'Path does not exist: {}'.format(image_path)
136 | return image_path
137 |
138 |
139 | def _get_ann_file(self, name):
140 | prefix = 'instances' if name.find('test') == -1 \
141 | else 'image_info'
142 | return os.path.join(self.root, 'annotations',
143 | prefix + '_' + name + '.json')
144 |
145 |
146 | def _load_coco_annotations(self, coco_name, indexes, _COCO):
147 | cache_file=os.path.join(self.cache_path,coco_name+'_gt_roidb.pkl')
148 | if os.path.exists(cache_file):
149 | with open(cache_file, 'rb') as fid:
150 | roidb = pickle.load(fid)
151 | print('{} gt roidb loaded from {}'.format(coco_name,cache_file))
152 | return roidb
153 |
154 | gt_roidb = [self._annotation_from_index(index, _COCO)
155 | for index in indexes]
156 | with open(cache_file, 'wb') as fid:
157 | pickle.dump(gt_roidb,fid,pickle.HIGHEST_PROTOCOL)
158 | print('wrote gt roidb to {}'.format(cache_file))
159 | return gt_roidb
160 |
161 |
162 | def _annotation_from_index(self, index, _COCO):
163 | """
164 | Loads COCO bounding-box instance annotations. Crowd instances are
165 | handled by marking their overlaps (with all categories) to -1. This
166 | overlap value means that crowd "instances" are excluded from training.
167 | """
168 | im_ann = _COCO.loadImgs(index)[0]
169 | width = im_ann['width']
170 | height = im_ann['height']
171 |
172 | annIds = _COCO.getAnnIds(imgIds=index, iscrowd=None)
173 | objs = _COCO.loadAnns(annIds)
174 | # Sanitize bboxes -- some are invalid
175 | valid_objs = []
176 | for obj in objs:
177 | x1 = np.max((0, obj['bbox'][0]))
178 | y1 = np.max((0, obj['bbox'][1]))
179 | x2 = np.min((width - 1, x1 + np.max((0, obj['bbox'][2] - 1))))
180 | y2 = np.min((height - 1, y1 + np.max((0, obj['bbox'][3] - 1))))
181 | if obj['area'] > 0 and x2 >= x1 and y2 >= y1:
182 | obj['clean_bbox'] = [x1, y1, x2, y2]
183 | valid_objs.append(obj)
184 | objs = valid_objs
185 | num_objs = len(objs)
186 |
187 | res = np.zeros((num_objs, 5))
188 |
189 | # Lookup table to map from COCO category ids to our internal class
190 | # indices
191 | coco_cat_id_to_class_ind = dict([(self._class_to_coco_cat_id[cls],
192 | self._class_to_ind[cls])
193 | for cls in self._classes])
194 |
195 | for ix, obj in enumerate(objs):
196 | cls = coco_cat_id_to_class_ind[obj['category_id']]
197 | res[ix, 0:4] = obj['clean_bbox']
198 | res[ix, 4] = cls
199 |
200 | return res
201 |
202 |
203 |
204 | def __getitem__(self, index):
205 | img_id = self.ids[index]
206 | target = self.annotations[index]
207 | if self.multiscale:
208 | if index % (self.batch_size * 20) == 0:
209 | rnd = (random.randint(0,9) + 10) * 32
210 | print("resize scale", index, rnd)
211 | self.resize_wh = (rnd, rnd)
212 | img = cv2.imread(img_id, cv2.IMREAD_COLOR)
213 | height, width, _ = img.shape
214 |
215 | if self.transform is not None:
216 | img, target = self.transform(img, target, self.resize_wh)
217 |
218 | return img, target
219 |
220 | def __len__(self):
221 | return len(self.ids)
222 |
223 | def pull_image(self, index):
224 | '''Returns the original image object at index in PIL form
225 |
226 | Note: not using self.__getitem__(), as any transformations passed in
227 | could mess up this functionality.
228 |
229 | Argument:
230 | index (int): index of img to show
231 | Return:
232 | PIL img
233 | '''
234 | img_id = self.ids[index]
235 | return cv2.imread(img_id, cv2.IMREAD_COLOR), img_id
236 |
237 |
238 | def pull_tensor(self, index):
239 | '''Returns the original image at an index in tensor form
240 |
241 | Note: not using self.__getitem__(), as any transformations passed in
242 | could mess up this functionality.
243 |
244 | Argument:
245 | index (int): index of img to show
246 | Return:
247 | tensorized version of img, squeezed
248 | '''
249 | to_tensor = transforms.ToTensor()
250 | return torch.Tensor(self.pull_image(index)).unsqueeze_(0)
251 |
252 | def _print_detection_eval_metrics(self, coco_eval):
253 | IoU_lo_thresh = 0.5
254 | IoU_hi_thresh = 0.95
255 | def _get_thr_ind(coco_eval, thr):
256 | ind = np.where((coco_eval.params.iouThrs > thr - 1e-5) &
257 | (coco_eval.params.iouThrs < thr + 1e-5))[0][0]
258 | iou_thr = coco_eval.params.iouThrs[ind]
259 | assert np.isclose(iou_thr, thr)
260 | return ind
261 |
262 | ind_lo = _get_thr_ind(coco_eval, IoU_lo_thresh)
263 | ind_hi = _get_thr_ind(coco_eval, IoU_hi_thresh)
264 | # precision has dims (iou, recall, cls, area range, max dets)
265 | # area range index 0: all area ranges
266 | # max dets index 2: 100 per image
267 | precision = \
268 | coco_eval.eval['precision'][ind_lo:(ind_hi + 1), :, :, 0, 2]
269 | ap_default = np.mean(precision[precision > -1])
270 | print('~~~~ Mean and per-category AP @ IoU=[{:.2f},{:.2f}] '
271 | '~~~~'.format(IoU_lo_thresh, IoU_hi_thresh))
272 | print('{:.1f}'.format(100 * ap_default))
273 | for cls_ind, cls in enumerate(self._classes):
274 | if cls == '__background__':
275 | continue
276 | # minus 1 because of __background__
277 | precision = coco_eval.eval['precision'][ind_lo:(ind_hi + 1), :, cls_ind - 1, 0, 2]
278 | ap = np.mean(precision[precision > -1])
279 | print('{:.1f}'.format(100 * ap))
280 |
281 | print('~~~~ Summary metrics ~~~~')
282 | coco_eval.summarize()
283 |
284 | def _do_detection_eval(self, res_file, output_dir):
285 | ann_type = 'bbox'
286 | coco_dt = self._COCO.loadRes(res_file)
287 | coco_eval = COCOeval(self._COCO, coco_dt)
288 | coco_eval.params.useSegm = (ann_type == 'segm')
289 | coco_eval.evaluate()
290 | coco_eval.accumulate()
291 | self._print_detection_eval_metrics(coco_eval)
292 | eval_file = os.path.join(output_dir, 'detection_results.pkl')
293 | with open(eval_file, 'wb') as fid:
294 | pickle.dump(coco_eval, fid, pickle.HIGHEST_PROTOCOL)
295 | print('Wrote COCO eval results to: {}'.format(eval_file))
296 |
297 | def _coco_results_one_category(self, boxes, cat_id):
298 | results = []
299 | for im_ind, index in enumerate(self.image_indexes):
300 | # print(type(boxes[im_ind]))
301 | # print(boxes[im_ind])
302 | # dets = boxes[im_ind].astype(np.float)
303 | dets = boxes[im_ind]
304 | if dets == []:
305 | continue
306 | dets = boxes[im_ind].astype(np.float)
307 | scores = dets[:, -1]
308 | xs = dets[:, 0]
309 | ys = dets[:, 1]
310 | ws = dets[:, 2] - xs + 1
311 | hs = dets[:, 3] - ys + 1
312 | results.extend(
313 | [{'image_id' : index,
314 | 'category_id' : cat_id,
315 | 'bbox' : [xs[k], ys[k], ws[k], hs[k]],
316 | 'score' : scores[k]} for k in range(dets.shape[0])])
317 | return results
318 |
319 | def _write_coco_results_file(self, all_boxes, res_file):
320 | # [{"image_id": 42,
321 | # "category_id": 18,
322 | # "bbox": [258.15,41.29,348.26,243.78],
323 | # "score": 0.236}, ...]
324 | results = []
325 | for cls_ind, cls in enumerate(self._classes):
326 | if cls == '__background__':
327 | continue
328 | print('Collecting {} results ({:d}/{:d})'.format(cls, cls_ind,
329 | self.num_classes ))
330 | coco_cat_id = self._class_to_coco_cat_id[cls]
331 | results.extend(self._coco_results_one_category(all_boxes[cls_ind],
332 | coco_cat_id))
333 | '''
334 | if cls_ind ==30:
335 | res_f = res_file+ '_1.json'
336 | print('Writing results json to {}'.format(res_f))
337 | with open(res_f, 'w') as fid:
338 | json.dump(results, fid)
339 | results = []
340 | '''
341 | #res_f2 = res_file+'_2.json'
342 | print('Writing results json to {}'.format(res_file))
343 | with open(res_file, 'w') as fid:
344 | json.dump(results, fid)
345 |
346 | def evaluate_detections(self, all_boxes, output_dir):
347 | res_file = os.path.join(output_dir, ('detections_' +
348 | self.coco_name +
349 | '_results'))
350 | res_file += '.json'
351 | self._write_coco_results_file(all_boxes, res_file)
352 | # Only do evaluation on non-test sets
353 | if self.coco_name.find('test') == -1:
354 | self._do_detection_eval(res_file, output_dir)
355 | # Optionally cleanup results json file
356 |
357 |
--------------------------------------------------------------------------------
/data/config.py:
--------------------------------------------------------------------------------
1 | # config.py
2 | import os
3 | import os.path
4 |
5 | pwd = os.getcwd()
6 | VOCroot = os.path.join(pwd, "data/datasets/VOCdevkit0712/")
7 | COCOroot = os.path.join(pwd, "data/datasets/coco2015")
8 |
9 | datasets_dict = {"VOC": [('0712', '0712_trainval')],
10 | "VOC0712++": [('0712', '0712_trainval_test')],
11 | "VOC2012" : [('2012', '2012_trainval')],
12 | "COCO": [('2014', 'train'), ('2014', 'valminusminival')],
13 | "VOC2007": [('0712', "2007_test")],
14 | "COCOval": [('2014', 'minival')]}
15 |
16 |
17 | voc_config = {
18 | 'anchors' : [[116, 90], [156, 198], [373, 326],
19 | [30, 61], [62, 45], [59, 119],
20 | [10, 13], [16, 30], [33, 23]],
21 | 'root': VOCroot,
22 | 'num_classes': 20,
23 | 'multiscale' : True,
24 | 'name_path' : "./model/voc.names",
25 | 'anchors_mask' : [[0,1,2], [3,4,5], [6,7,8]]
26 | }
27 |
28 | coco_config = {
29 | 'anchors' : [[116, 90], [156, 198], [373, 326],
30 | [30, 61], [62, 45], [59, 119],
31 | [10, 13], [16, 30], [33, 23]],
32 | 'root': COCOroot,
33 | 'num_classes': 80,
34 | 'multiscale' : True,
35 | 'name_path' : "./model/coco.names",
36 | 'anchors_mask' : [[0,1,2], [3,4,5], [6,7,8]]
37 | }
38 |
39 | # anchors = [[214,327], [326,193], [359,359],
40 | # [116,286], [122,97], [171,180],
41 | # [24,34], [46,84], [68,185]]
--------------------------------------------------------------------------------
/data/data_augment.py:
--------------------------------------------------------------------------------
1 | """Data augmentation functionality. Passed as callable transformations to
2 | Dataset classes.
3 |
4 | The data augmentation procedures were interpreted from @weiliu89's SSD paper
5 | http://arxiv.org/abs/1512.02325
6 |
7 | TODO: implement data_augment for training
8 |
9 | Ellis Brown, Max deGroot
10 | """
11 |
12 | import torch
13 | from torchvision import transforms
14 | import cv2
15 | import numpy as np
16 | import random
17 | import math
18 |
19 |
20 | def matrix_iou(a,b):
21 | """
22 | return iou of a and b, numpy version for data augenmentation
23 | """
24 | lt = np.maximum(a[:, np.newaxis, :2], b[:, :2])
25 | rb = np.minimum(a[:, np.newaxis, 2:], b[:, 2:])
26 |
27 | area_i = np.prod(rb - lt, axis=2) * (lt < rb).all(axis=2)
28 | area_a = np.prod(a[:, 2:] - a[:, :2], axis=1)
29 | area_b = np.prod(b[:, 2:] - b[:, :2], axis=1)
30 | return area_i / (area_a[:, np.newaxis] + area_b - area_i)
31 |
32 | def _crop(image, boxes, labels):
33 | height, width, _ = image.shape
34 |
35 | if len(boxes)== 0:
36 | return image, boxes, labels
37 |
38 | while True:
39 | mode = random.choice((
40 | None,
41 | (0.3, None),
42 | (0.5, None),
43 | (0.7, None),
44 | (0.9, None),
45 | # (None, None),
46 | ))
47 |
48 | if mode is None:
49 | return image, boxes, labels
50 |
51 | min_iou, max_iou = mode
52 | if min_iou is None:
53 | min_iou = float('-inf')
54 | if max_iou is None:
55 | max_iou = float('inf')
56 |
57 | for _ in range(50):
58 | scale = random.uniform(0.3,1.)
59 | min_ratio = max(0.5, scale*scale)
60 | max_ratio = min(2, 1. / scale / scale)
61 | ratio = math.sqrt(random.uniform(min_ratio, max_ratio))
62 | w = int(scale * ratio * width)
63 | h = int((scale / ratio) * height)
64 |
65 |
66 | l = random.randrange(width - w)
67 | t = random.randrange(height - h)
68 | roi = np.array((l, t, l + w, t + h))
69 |
70 | iou = matrix_iou(boxes, roi[np.newaxis])
71 |
72 | if not (min_iou <= iou.min() and iou.max() <= max_iou):
73 | continue
74 |
75 | image_t = image[roi[1]:roi[3], roi[0]:roi[2]]
76 |
77 | centers = (boxes[:, :2] + boxes[:, 2:]) / 2
78 | mask = np.logical_and(roi[:2] < centers, centers < roi[2:]) \
79 | .all(axis=1)
80 | boxes_t = boxes[mask].copy()
81 | labels_t = labels[mask].copy()
82 | if len(boxes_t) == 0:
83 | continue
84 |
85 | boxes_t[:, :2] = np.maximum(boxes_t[:, :2], roi[:2])
86 | boxes_t[:, :2] -= roi[:2]
87 | boxes_t[:, 2:] = np.minimum(boxes_t[:, 2:], roi[2:])
88 | boxes_t[:, 2:] -= roi[:2]
89 |
90 | return image_t, boxes_t,labels_t
91 |
92 |
93 | def _distort(image):
94 | def _convert(image, alpha=1, beta=0):
95 | tmp = image.astype(float) * alpha + beta
96 | tmp[tmp < 0] = 0
97 | tmp[tmp > 255] = 255
98 | image[:] = tmp
99 |
100 | image = image.copy()
101 |
102 | if random.randrange(2):
103 | _convert(image, beta=random.uniform(-32, 32))
104 |
105 | if random.randrange(2):
106 | _convert(image, alpha=random.uniform(0.5, 1.5))
107 |
108 | image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
109 |
110 | if random.randrange(2):
111 | tmp = image[:, :, 0].astype(int) + random.randint(-18, 18)
112 | tmp %= 180
113 | image[:, :, 0] = tmp
114 |
115 | if random.randrange(2):
116 | _convert(image[:, :, 1], alpha=random.uniform(0.5, 1.5))
117 |
118 | image = cv2.cvtColor(image, cv2.COLOR_HSV2BGR)
119 |
120 | return image
121 |
122 |
123 | def _expand(image, boxes,fill, p):
124 | if random.random() > p:
125 | return image, boxes
126 |
127 | height, width, depth = image.shape
128 | for _ in range(50):
129 | scale = random.uniform(1,4)
130 |
131 | min_ratio = max(0.5, 1./scale/scale)
132 | max_ratio = min(2, scale*scale)
133 | ratio = math.sqrt(random.uniform(min_ratio, max_ratio))
134 | ws = scale * ratio
135 | hs = scale / ratio
136 | if ws < 1 or hs < 1:
137 | continue
138 | w = int(ws * width)
139 | h = int(hs * height)
140 |
141 | left = random.randint(0, w - width)
142 | top = random.randint(0, h - height)
143 |
144 | boxes_t = boxes.copy()
145 | boxes_t[:, :2] += (left, top)
146 | boxes_t[:, 2:] += (left, top)
147 |
148 |
149 | expand_image = np.empty(
150 | (h, w, depth),
151 | dtype=image.dtype)
152 | expand_image[:, :] = fill
153 | expand_image[top:top + height, left:left + width] = image
154 | image = expand_image
155 |
156 | return image, boxes_t
157 |
158 |
159 | def _mirror(image, boxes):
160 | _, width, _ = image.shape
161 | if random.randrange(2):
162 | image = image[:, ::-1]
163 | boxes = boxes.copy()
164 | boxes[:, 0::2] = width - boxes[:, 2::-2]
165 | return image, boxes
166 |
167 | def rand(a=0, b=1):
168 | return np.random.rand()*(b-a) + a
169 |
170 | # def random_letterbox_image(img, resize_wh, boxes, jitter=0.3):
171 | # '''resize image with unchanged aspect ratio using padding'''
172 | # img_w, img_h = img.shape[1], img.shape[0]
173 | # w, h = resize_wh
174 | # new_ar = w / h * rand(1-jitter, 1+jitter)/rand(1-jitter, 1+jitter)
175 | # scale = rand(.25, 2)
176 | # if new_ar < 1:
177 | # nh = int(scale * h)
178 | # nw = int(nh * new_ar)
179 | # else:
180 | # nw = int(scale * w)
181 | # nh = int(nw / new_ar)
182 | # resized_image = cv2.resize(img, (nw, nh), interpolation = cv2.INTER_CUBIC)
183 |
184 | # dx = int(rand(0, w - nw))
185 | # dy = int(rand(0, h - nh))
186 |
187 | # if (w - nw) < 0:
188 | # cxmin = 0
189 | # xmin = nw - w + dx
190 | # xmax = nw + dx
191 | # cxmax = xmax - xmin
192 | # else:
193 | # cxmin = dx
194 | # xmin = 0
195 | # xmax = nw
196 | # cxmax = nw + dx
197 | # if (h - nh) < 0:
198 | # cymin = 0
199 | # ymin = nh - h + dy
200 | # ymax = nh + dy
201 | # cymax = ymax - ymin
202 | # else:
203 | # cymin = dy
204 | # ymin = 0
205 | # ymax = nh
206 | # cymax = nh + dy
207 |
208 | # resized_image = resized_image[ymin:ymax,xmin:xmax,:]
209 |
210 | # boxes[:, 0::2] = (boxes[:, 0::2] * nw / img_w + dx) / w
211 | # boxes[:, 1::2] = (boxes[:, 1::2] * nh / img_h + dy ) / h
212 | # # clamp boxes
213 | # boxes[:, 0:2][boxes[:, 0:2]<=0] = 0
214 | # boxes[:, 2:][boxes[:, 2:]>=1] = 0.9999
215 |
216 | # canvas = np.full((resize_wh[1], resize_wh[0], 3), 128)
217 | # canvas[cymin:cymax, cxmin:cxmax, :] = resized_image
218 |
219 | # img_ = canvas[:,:,::-1].transpose((2,0,1)).copy()
220 | # img_ = torch.from_numpy(img_).float().div(255.0)
221 | # return img_, boxes
222 |
223 | def letterbox_image(img, resize_wh, boxes):
224 | '''resize image with unchanged aspect ratio using padding'''
225 | img_w, img_h = img.shape[1], img.shape[0]
226 | w, h = resize_wh
227 | new_w = int(img_w * min(w/img_w, h/img_h))
228 | new_h = int(img_h * min(w/img_w, h/img_h))
229 |
230 | if len(boxes) > 0:
231 | boxes = boxes.copy()
232 | dim_diff = np.abs(img_w - img_h)
233 | max_size = max(img_w, img_h)
234 | if img_w > img_h:
235 | boxes[:, 1::2] += dim_diff // 2
236 | else:
237 | boxes[:, 0::2] += dim_diff // 2
238 | boxes[:, 0::2] /= max_size
239 | boxes[:, 1::2] /= max_size
240 | resized_image = cv2.resize(img, (new_w, new_h), interpolation = cv2.INTER_CUBIC)
241 | canvas = np.full((resize_wh[0], resize_wh[1], 3), 128)
242 | canvas[(h-new_h)//2:(h-new_h)//2 + new_h,(w-new_w)//2:(w-new_w)//2 + new_w, :] = resized_image
243 | img_ = canvas[:,:,::-1].transpose((2,0,1)).copy()
244 | img_ = torch.from_numpy(img_).float().div(255.0)
245 |
246 | return img_, boxes
247 |
248 |
249 | class preproc(object):
250 |
251 | def __init__(self):
252 | self.means = [128, 128, 128]
253 | self.p = 0.5
254 |
255 | def __call__(self, image, targets, resize_wh, use_pad=True):
256 | boxes = targets[:, :-1].copy()
257 | labels = targets[:, -1].copy()
258 | height, width, _ = image.shape
259 | if len(boxes) == 0:
260 | targets = np.zeros((1,5))
261 | image, _ = letterbox_image(image, resize_wh, boxes)
262 | return image, targets
263 | image_o = image.copy()
264 | targets_o = targets.copy()
265 | image_t, boxes, labels = _crop(image, boxes, labels)
266 | image_t = _distort(image_t)
267 | image_t, boxes = _expand(image_t, boxes, self.means, self.p)
268 | image_t, boxes = _mirror(image_t, boxes)
269 | image_t, boxes = letterbox_image(image_t, resize_wh, boxes)
270 |
271 | boxes = boxes.copy()
272 | b_w = (boxes[:, 2] - boxes[:, 0])*1.
273 | b_h = (boxes[:, 3] - boxes[:, 1])*1.
274 | mask_b= np.minimum(b_w, b_h) > 0.01
275 | boxes_t = boxes[mask_b]
276 | labels_t = labels[mask_b].copy()
277 |
278 | if len(boxes_t) == 0:
279 | boxes_t = targets_o[:, :4].copy()
280 | labels_t = targets_o[:, -1].copy()
281 | image_t, boxes_t = letterbox_image(image_o, resize_wh, boxes_t)
282 |
283 | boxes_t[:, 0::2] *= resize_wh[0]
284 | boxes_t[:, 1::2] *= resize_wh[1]
285 |
286 | labels_t = np.expand_dims(labels_t, 1)
287 | targets_t = np.hstack((boxes_t, labels_t))
288 |
289 | return image_t, targets_t
290 |
291 |
292 |
--------------------------------------------------------------------------------
/data/voc0712.py:
--------------------------------------------------------------------------------
1 | """VOC Dataset Classes
2 |
3 | Original author: Francisco Massa
4 | https://github.com/fmassa/vision/blob/voc_dataset/torchvision/datasets/voc.py
5 |
6 | Updated by: Ellis Brown, Max deGroot
7 | """
8 |
9 | import os
10 | import os.path
11 | import pickle
12 | import sys
13 | import torch
14 | import torch.utils.data as data
15 | from PIL import Image, ImageDraw, ImageFont
16 | import cv2
17 | import numpy as np
18 | from .voc_eval import voc_eval
19 | import random
20 | from .data_augment import preproc
21 | if sys.version_info[0] == 2:
22 | import xml.etree.cElementTree as ET
23 | else:
24 | import xml.etree.ElementTree as ET
25 |
26 | VOC_CLASSES = (
27 | 'aeroplane', 'bicycle', 'bird', 'boat',
28 | 'bottle', 'bus', 'car', 'cat', 'chair',
29 | 'cow', 'diningtable', 'dog', 'horse',
30 | 'motorbike', 'person', 'pottedplant',
31 | 'sheep', 'sofa', 'train', 'tvmonitor')
32 |
33 | # for making bounding boxes pretty
34 | COLORS = ((255, 0, 0, 128), (0, 255, 0, 128), (0, 0, 255, 128),
35 | (0, 255, 255, 128), (255, 0, 255, 128), (255, 255, 0, 128))
36 |
37 |
38 | class VOCAnnotationTransform(object):
39 | """Transforms a VOC annotation into a Tensor of bbox coords and label index
40 | Initilized with a dictionary lookup of classnames to indexes
41 |
42 | Arguments:
43 | class_to_ind (dict, optional): dictionary lookup of classnames -> indexes
44 | (default: alphabetic indexing of VOC's 20 classes)
45 | keep_difficult (bool, optional): keep difficult instances or not
46 | (default: False)
47 | height (int): height
48 | width (int): width
49 | """
50 |
51 | def __init__(self, class_to_ind=None, keep_difficult=False):
52 | self.class_to_ind = class_to_ind or dict(
53 | zip(VOC_CLASSES, range(len(VOC_CLASSES))))
54 | self.keep_difficult = keep_difficult
55 |
56 | def __call__(self, target, width, height):
57 | """
58 | Arguments:
59 | target (annotation) : the target annotation to be made usable
60 | will be an ET.Element
61 | Returns:
62 | a list containing lists of bounding boxes [bbox coords, class name]
63 | """
64 | res = np.empty((0,5))
65 | for obj in target.iter('object'):
66 | difficult = int(obj.find('difficult').text) == 1
67 | if not self.keep_difficult and difficult:
68 | continue
69 | name = obj.find('name').text.lower().strip()
70 | bbox = obj.find('bndbox')
71 |
72 | pts = ['xmin', 'ymin', 'xmax', 'ymax']
73 | bndbox = []
74 | for i, pt in enumerate(pts):
75 | cur_pt = int(bbox.find(pt).text)
76 | bndbox.append(cur_pt)
77 | label_idx = self.class_to_ind[name]
78 | bndbox.append(label_idx)
79 | res = np.vstack((res, bndbox))
80 | return res
81 |
82 |
83 | class VOCDetection(data.Dataset):
84 | """VOC Detection Dataset Object
85 |
86 | input is image, target is annotation
87 |
88 | Arguments:
89 | root (string): filepath to VOCdevkit folder.
90 | image_set (string): imageset to use (eg. 'train', 'val', 'test')
91 | transform (callable, optional): transformation to perform on the
92 | input image
93 | target_transform (callable, optional): transformation to perform on the
94 | target `annotation`
95 | (eg: take in caption string, return tensor of word indices)
96 | dataset_name (string, optional): which dataset to load
97 | (default: 'VOC2007')
98 | """
99 |
100 | def __init__(self, root, image_sets, resize_wh, batch_size=16, multiscale=False, dataset_name='VOC0712'):
101 | self.root = root
102 | self.image_set = image_sets
103 | self.transform = preproc()
104 | self.resize_wh = resize_wh
105 | self.target_transform = VOCAnnotationTransform()
106 | self.name = dataset_name
107 | self.multiscale = multiscale
108 | self.batch_size = batch_size
109 | self._annopath = os.path.join('%s', 'Annotations', '%s.xml')
110 | self._imgpath = os.path.join('%s', 'JPEGImages', '%s.jpg')
111 | self.ids = list()
112 | for (year, name) in image_sets:
113 | self._year = year
114 | rootpath = os.path.join(self.root, 'VOC' + year)
115 | for line in open(os.path.join(rootpath, 'ImageSets', 'Main', name + '.txt')):
116 | self.ids.append((rootpath, line.strip()))
117 |
118 | def __getitem__(self, index):
119 | img_id = self.ids[index]
120 | # multiscale train
121 | if self.multiscale:
122 | if index % (self.batch_size * 20) == 0:
123 | rnd = (random.randint(0,9) + 10) * 32
124 | print("resize scale", index, rnd)
125 | self.resize_wh = (rnd, rnd)
126 | target = ET.parse(self._annopath % img_id).getroot()
127 | img = cv2.imread(self._imgpath % img_id)
128 | height, width, channels = img.shape
129 | if self.target_transform is not None:
130 | target = self.target_transform(target, width, height)
131 |
132 | if self.transform is not None:
133 | img, target = self.transform(img, target, self.resize_wh)
134 |
135 |
136 | return img, target
137 |
138 |
139 | def __len__(self):
140 | return len(self.ids)
141 |
142 | def pull_image(self, index):
143 | '''Returns the original image object at index in PIL form
144 |
145 | Note: not using self.__getitem__(), as any transformations passed in
146 | could mess up this functionality.
147 |
148 | Argument:
149 | index (int): index of img to show
150 | Return:
151 | PIL img
152 | '''
153 | img_id = self.ids[index]
154 | return cv2.imread(self._imgpath % img_id, cv2.IMREAD_COLOR), img_id
155 |
156 | def pull_anno(self, index):
157 | '''Returns the original annotation of image at index
158 |
159 | Note: not using self.__getitem__(), as any transformations passed in
160 | could mess up this functionality.
161 |
162 | Argument:
163 | index (int): index of img to get annotation of
164 | Return:
165 | list: [img_id, [(label, bbox coords),...]]
166 | eg: ('001718', [('dog', (96, 13, 438, 332))])
167 | '''
168 | img_id = self.ids[index]
169 | anno = ET.parse(self._annopath % img_id).getroot()
170 | gt = self.target_transform(anno, 1, 1)
171 | return img_id[1], gt
172 |
173 | def pull_tensor(self, index):
174 | '''Returns the original image at an index in tensor form
175 |
176 | Note: not using self.__getitem__(), as any transformations passed in
177 | could mess up this functionality.
178 |
179 | Argument:
180 | index (int): index of img to show
181 | Return:
182 | tensorized version of img, squeezed
183 | '''
184 | return torch.Tensor(self.pull_image(index)).unsqueeze_(0)
185 |
186 | def evaluate_detections(self, all_boxes, output_dir=None):
187 | """
188 | all_boxes is a list of length number-of-classes.
189 | Each list element is a list of length number-of-images.
190 | Each of those list elements is either an empty list []
191 | or a numpy array of detection.
192 |
193 | all_boxes[class][image] = [] or np.array of shape #dets x 5
194 | """
195 | self._write_voc_results_file(all_boxes)
196 | self._do_python_eval(output_dir)
197 |
198 | def _get_voc_results_file_template(self):
199 | filename = 'comp3_det_test' + '_{:s}.txt'
200 | filedir = os.path.join(
201 | self.root, 'results', 'VOC' + self._year, 'Main')
202 | if not os.path.exists(filedir):
203 | os.makedirs(filedir)
204 | path = os.path.join(filedir, filename)
205 | return path
206 |
207 | def _write_voc_results_file(self, all_boxes):
208 | for cls_ind, cls in enumerate(VOC_CLASSES):
209 | print('Writing {} VOC results file'.format(cls))
210 | filename = self._get_voc_results_file_template().format(cls)
211 | # print(filename)
212 | with open(filename, 'wt') as f:
213 | for im_ind, index in enumerate(self.ids):
214 | index = index[1]
215 | dets = all_boxes[cls_ind][im_ind]
216 | if dets == []:
217 | continue
218 | for k in range(dets.shape[0]):
219 | f.write('{:s} {:.3f} {:.1f} {:.1f} {:.1f} {:.1f}\n'.
220 | format(index, dets[k, -1],
221 | dets[k, 0] + 1, dets[k, 1] + 1,
222 | dets[k, 2] + 1, dets[k, 3] + 1))
223 |
224 | def _do_python_eval(self, output_dir='output'):
225 | rootpath = os.path.join(self.root, 'VOC' + self._year)
226 | name = self.image_set[0][1]
227 | annopath = os.path.join(
228 | rootpath,
229 | 'Annotations',
230 | '{:s}.xml')
231 | imagesetfile = os.path.join(
232 | rootpath,
233 | 'ImageSets',
234 | 'Main',
235 | name+'.txt')
236 | cachedir = os.path.join(self.root, 'annotations_cache')
237 | aps = []
238 | # The PASCAL VOC metric changed in 2010
239 | use_07_metric = True if int(self._year) < 2010 else False
240 | # use_07_metric = True
241 | print('VOC07 metric? ' + ('Yes' if use_07_metric else 'No'))
242 | if output_dir is not None and not os.path.isdir(output_dir):
243 | os.mkdir(output_dir)
244 | for i, cls in enumerate(VOC_CLASSES):
245 |
246 | filename = self._get_voc_results_file_template().format(cls)
247 | rec, prec, ap = voc_eval(
248 | filename, annopath, imagesetfile, cls, cachedir, ovthresh=0.5,
249 | use_07_metric=use_07_metric)
250 | aps += [ap]
251 | print('AP for {} = {:.4f}'.format(cls, ap))
252 | if output_dir is not None:
253 | with open(os.path.join(output_dir, cls + '_pr.pkl'), 'wb') as f:
254 | pickle.dump({'rec': rec, 'prec': prec, 'ap': ap}, f)
255 | print('Mean AP = {:.4f}'.format(np.mean(aps)))
256 | print('~~~~~~~~')
257 | print('Results:')
258 | for ap in aps:
259 | print('{:.3f}'.format(ap))
260 | print('{:.3f}'.format(np.mean(aps)))
261 | print('~~~~~~~~')
262 | print('')
263 | print('--------------------------------------------------------------')
264 | print('Results computed with the **unofficial** Python eval code.')
265 | print('Results should be very close to the official MATLAB eval code.')
266 | print('Recompute with `./tools/reval.py --matlab ...` for your paper.')
267 | print('-- Thanks, The Management')
268 | print('--------------------------------------------------------------')
269 |
270 |
271 | def detection_collate(batch):
272 | """Custom collate fn for dealing with batches of images that have a different
273 | number of associated object annotations (bounding boxes).
274 |
275 | Arguments:
276 | batch: (tuple) A tuple of tensor images and lists of annotations
277 |
278 | Return:
279 | A tuple containing:
280 | 1) (tensor) batch of images stacked on their 0 dim
281 | 2) (list of tensors) annotations for a given image are stacked on 0 dim
282 | """
283 | targets = []
284 | imgs = []
285 | for sample in batch:
286 | imgs.append(sample[0])
287 | targets.append(torch.FloatTensor(sample[1]))
288 | return torch.stack(imgs, 0), targets
289 |
290 |
--------------------------------------------------------------------------------
/data/voc_eval.py:
--------------------------------------------------------------------------------
1 | # --------------------------------------------------------
2 | # Fast/er R-CNN
3 | # Licensed under The MIT License [see LICENSE for details]
4 | # Written by Bharath Hariharan
5 | # --------------------------------------------------------
6 |
7 | import xml.etree.ElementTree as ET
8 | import os
9 | import pickle
10 | import numpy as np
11 | import pdb
12 |
13 |
14 | def parse_rec(filename):
15 | """ Parse a PASCAL VOC xml file """
16 | tree = ET.parse(filename)
17 | objects = []
18 | for obj in tree.findall('object'):
19 | obj_struct = {}
20 | obj_struct['name'] = obj.find('name').text
21 | obj_struct['pose'] = obj.find('pose').text
22 | obj_struct['truncated'] = int(obj.find('truncated').text)
23 | obj_struct['difficult'] = int(obj.find('difficult').text)
24 | bbox = obj.find('bndbox')
25 | obj_struct['bbox'] = [int(bbox.find('xmin').text),
26 | int(bbox.find('ymin').text),
27 | int(bbox.find('xmax').text),
28 | int(bbox.find('ymax').text)]
29 | objects.append(obj_struct)
30 |
31 | return objects
32 |
33 |
34 |
35 | def voc_ap(rec, prec, use_07_metric=False):
36 | """ ap = voc_ap(rec, prec, [use_07_metric])
37 | Compute VOC AP given precision and recall.
38 | If use_07_metric is true, uses the
39 | VOC 07 11 point method (default:False).
40 | """
41 | if use_07_metric:
42 | # 11 point metric
43 | ap = 0.
44 | for t in np.arange(0., 1.1, 0.1):
45 | if np.sum(rec >= t) == 0:
46 | p = 0
47 | else:
48 | p = np.max(prec[rec >= t])
49 | ap = ap + p / 11.
50 | else:
51 | # correct AP calculation
52 | # first append sentinel values at the end
53 | mrec = np.concatenate(([0.], rec, [1.]))
54 | mpre = np.concatenate(([0.], prec, [0.]))
55 |
56 | # compute the precision envelope
57 | for i in range(mpre.size - 1, 0, -1):
58 | mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
59 |
60 | # to calculate area under PR curve, look for points
61 | # where X axis (recall) changes value
62 | i = np.where(mrec[1:] != mrec[:-1])[0]
63 |
64 | # and sum (\Delta recall) * prec
65 | ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
66 | return ap
67 |
68 | def voc_eval(detpath,
69 | annopath,
70 | imagesetfile,
71 | classname,
72 | cachedir,
73 | ovthresh=0.5,
74 | use_07_metric=False):
75 | """rec, prec, ap = voc_eval(detpath,
76 | annopath,
77 | imagesetfile,
78 | classname,
79 | [ovthresh],
80 | [use_07_metric])
81 |
82 | Top level function that does the PASCAL VOC evaluation.
83 |
84 | detpath: Path to detections
85 | detpath.format(classname) should produce the detection results file.
86 | annopath: Path to annotations
87 | annopath.format(imagename) should be the xml annotations file.
88 | imagesetfile: Text file containing the list of images, one image per line.
89 | classname: Category name (duh)
90 | cachedir: Directory for caching the annotations
91 | [ovthresh]: Overlap threshold (default = 0.5)
92 | [use_07_metric]: Whether to use VOC07's 11 point AP computation
93 | (default False)
94 | """
95 | # assumes detections are in detpath.format(classname)
96 | # assumes annotations are in annopath.format(imagename)
97 | # assumes imagesetfile is a text file with each line an image name
98 | # cachedir caches the annotations in a pickle file
99 |
100 | # first load gt
101 | if not os.path.isdir(cachedir):
102 | os.mkdir(cachedir)
103 | cachefile = os.path.join(cachedir, imagesetfile.split(".")[0]+'_annots.pkl')
104 | # read list of images
105 | with open(imagesetfile, 'r') as f:
106 | lines = f.readlines()
107 | imagenames = [x.strip() for x in lines]
108 |
109 | if not os.path.isfile(cachefile):
110 | # load annots
111 | recs = {}
112 | for i, imagename in enumerate(imagenames):
113 | recs[imagename] = parse_rec(annopath.format(imagename))
114 | if i % 100 == 0:
115 | print('Reading annotation for {:d}/{:d}'.format(
116 | i + 1, len(imagenames)))
117 | # save
118 | print('Saving cached annotations to {:s}'.format(cachefile))
119 | with open(cachefile, 'wb') as f:
120 | pickle.dump(recs, f)
121 | else:
122 | # load
123 | with open(cachefile, 'rb') as f:
124 | recs = pickle.load(f)
125 |
126 | # extract gt objects for this class
127 | class_recs = {}
128 | npos = 0
129 | for imagename in imagenames:
130 | R = [obj for obj in recs[imagename] if obj['name'] == classname]
131 | bbox = np.array([x['bbox'] for x in R])
132 | difficult = np.array([x['difficult'] for x in R]).astype(np.bool)
133 | det = [False] * len(R)
134 | npos = npos + sum(~difficult)
135 | class_recs[imagename] = {'bbox': bbox,
136 | 'difficult': difficult,
137 | 'det': det}
138 |
139 | # read dets
140 | detfile = detpath.format(classname)
141 | with open(detfile, 'r') as f:
142 | lines = f.readlines()
143 |
144 | splitlines = [x.strip().split(' ') for x in lines]
145 | image_ids = [x[0] for x in splitlines]
146 | confidence = np.array([float(x[1]) for x in splitlines])
147 | BB = np.array([[float(z) for z in x[2:]] for x in splitlines])
148 | # sort by confidence
149 | sorted_ind = np.argsort(-confidence)
150 | sorted_scores = np.sort(-confidence)
151 | BB = BB[sorted_ind, :]
152 | image_ids = [image_ids[x] for x in sorted_ind]
153 |
154 | # go down dets and mark TPs and FPs
155 | nd = len(image_ids)
156 | tp = np.zeros(nd)
157 | fp = np.zeros(nd)
158 | for d in range(nd):
159 | R = class_recs[image_ids[d]]
160 | bb = BB[d, :].astype(float)
161 | ovmax = -np.inf
162 | BBGT = R['bbox'].astype(float)
163 |
164 | if BBGT.size > 0:
165 | # compute overlaps
166 | # intersection
167 | ixmin = np.maximum(BBGT[:, 0], bb[0])
168 | iymin = np.maximum(BBGT[:, 1], bb[1])
169 | ixmax = np.minimum(BBGT[:, 2], bb[2])
170 | iymax = np.minimum(BBGT[:, 3], bb[3])
171 | iw = np.maximum(ixmax - ixmin + 1., 0.)
172 | ih = np.maximum(iymax - iymin + 1., 0.)
173 | inters = iw * ih
174 |
175 | # union
176 | uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) +
177 | (BBGT[:, 2] - BBGT[:, 0] + 1.) *
178 | (BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)
179 |
180 | overlaps = inters / uni
181 | ovmax = np.max(overlaps)
182 | jmax = np.argmax(overlaps)
183 |
184 | if ovmax > ovthresh:
185 | if not R['difficult'][jmax]:
186 | if not R['det'][jmax]:
187 | tp[d] = 1.
188 | R['det'][jmax] = 1
189 | else:
190 | fp[d] = 1.
191 | else:
192 | fp[d] = 1.
193 |
194 | # compute precision recall
195 | fp = np.cumsum(fp)
196 | tp = np.cumsum(tp)
197 | rec = tp / float(npos)
198 | # avoid divide by zero in case the first detection matches a difficult
199 | # ground truth
200 | prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
201 | ap = voc_ap(rec, prec, use_07_metric)
202 |
203 | return rec, prec, ap
204 |
--------------------------------------------------------------------------------
/demo.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | # Written by yq_yao
3 |
4 | from __future__ import division
5 | import time
6 | import torch
7 | import os
8 | os.environ["CUDA_VISIBLE_DEVICES"] = "1"
9 | import torch.nn as nn
10 | import torch.backends.cudnn as cudnn
11 | import numpy as np
12 | import cv2
13 | import argparse
14 | import os.path as osp
15 | import math
16 | from model.yolo import Yolov3
17 | from utils.box_utils import draw_rects, detection_postprecess
18 | from data.config import voc_config, coco_config
19 | from utils.preprocess import preproc_for_test
20 |
21 | def arg_parse():
22 | """
23 | Parse arguements to the detect module
24 | """
25 | parser = argparse.ArgumentParser(description='YOLO v3 Detection Module')
26 |
27 | parser.add_argument("--images", dest = 'images', help =
28 | "Image / Directory containing images to perform detection upon",default = "images", type = str)
29 | parser.add_argument("--confidence", dest = "confidence", help = "Object Confidence to filter predictions", default = 0.1)
30 | parser.add_argument("--nms_thresh", dest = "nms_thresh", help = "NMS Threshhold", default = 0.4)
31 | parser.add_argument("--input_wh", dest = "input_wh", type=int, nargs=2, default = [416, 416])
32 | parser.add_argument("--save_path", dest = "save_path", help = "coco name path", default = './output')
33 | parser.add_argument("--dataset", dest = "dataset", help = "VOC or COCO", default = 'VOC')
34 | parser.add_argument("--weights", dest = 'weights',
35 | help = "weightsfile",
36 | default = "./weights/convert_yolov3_coco.pth", type = str)
37 | parser.add_argument('--cuda', default=True, type=str,
38 | help='Use cuda to train model')
39 | parser.add_argument('--use_pad', default=True, type=str,
40 | help='Use pad to resize images')
41 | return parser.parse_args()
42 |
43 |
44 | if __name__ == '__main__':
45 | args = arg_parse()
46 | weightsfile = args.weights
47 | confidence = args.confidence
48 | nms_thresh = args.nms_thresh
49 | images = args.images
50 | input_wh = args.input_wh
51 | cuda = args.cuda
52 | use_pad = args.use_pad
53 | save_path = args.save_path
54 | dataset = args.dataset
55 | if dataset[0] == "V":
56 | cfg = voc_config
57 | elif dataset[1] == "C":
58 | cfg = coco_config
59 | else:
60 | print("only support VOC and COCO datasets !!!")
61 | name_path = cfg["name_path"]
62 | num_classes = cfg["num_classes"]
63 | anchors = cfg["anchors"]
64 |
65 | with open(name_path, "r") as f:
66 | classes = [i.strip() for i in f.readlines()]
67 | try:
68 | im_list = [osp.join(osp.realpath('.'), images, img) for img in os.listdir(images)]
69 | except NotADirectoryError:
70 | im_list = []
71 | im_list.append(osp.join(osp.realpath('.'), images))
72 | except FileNotFoundError:
73 | print ("No file or directory with the name {}".format(images))
74 | exit()
75 |
76 | net = Yolov3("test", input_wh, anchors, cfg["anchors_mask"], num_classes)
77 | state_dict = torch.load(weightsfile)
78 | from collections import OrderedDict
79 | new_state_dict = OrderedDict()
80 | for k, v in state_dict.items():
81 | head = k[:7]
82 | if head == 'module.':
83 | name = k[7:] # remove `module.`
84 | else:
85 | name = k
86 | new_state_dict[name] = v
87 | if cuda:
88 | net.cuda()
89 | cudnn.benchmark = True
90 | net.load_state_dict(new_state_dict)
91 | print("load weights successfully.....")
92 | net.eval()
93 | for img_path in im_list[:]:
94 | print(img_path)
95 | img = cv2.imread(img_path)
96 | ori_img = img.copy()
97 | ori_wh = (img.shape[1], img.shape[0])
98 | img = preproc_for_test(img, input_wh, use_pad)
99 | if cuda:
100 | img = img.cuda()
101 | st = time.time()
102 | detection = net(img)
103 | detect_time = time.time()
104 | detection = detection_postprecess(detection, confidence, num_classes, input_wh, ori_wh, use_pad=use_pad, nms_conf=nms_thresh)
105 | nms_time = time.time()
106 | draw_img = draw_rects(ori_img, detection, classes)
107 | draw_time = time.time()
108 | save_img_path = os.path.join(save_path, "output_" + img_path.split("/")[-1])
109 | cv2.imwrite(save_img_path, draw_img)
110 | final_time = time.time() - st
111 |
112 | print("detection time:", round(detect_time - st, 3), "nms_time:", round(nms_time - detect_time, 3), "draw_time:", round(draw_time - nms_time, 3), "final_time:", round(final_time ,3))
113 |
114 |
115 |
116 |
117 |
118 |
119 |
120 |
121 |
122 |
--------------------------------------------------------------------------------
/eval.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | # Written by yq_yao
3 |
4 | from __future__ import division
5 | import time
6 | import torch
7 | import os
8 | os.environ["CUDA_VISIBLE_DEVICES"] = "0"
9 | import torch.nn as nn
10 | from torch.autograd import Variable
11 | import torch.backends.cudnn as cudnn
12 | import numpy as np
13 | import cv2
14 | import argparse
15 | import os.path as osp
16 | import math
17 | import pickle
18 | from model.yolo import Yolov3
19 | from data.voc0712 import VOCDetection, detection_collate
20 | from data.coco import COCODetection
21 | from data.config import voc_config, coco_config, datasets_dict
22 | from utils.box_utils import draw_rects, detection_postprecess
23 | from utils.timer import Timer
24 | from utils.preprocess import preproc_for_test
25 |
26 | def arg_parse():
27 | """
28 | Parse arguements to the detect module
29 |
30 | """
31 | parser = argparse.ArgumentParser(description='YOLO v3 Detection Module')
32 |
33 | parser.add_argument('--dataset', default='VOC',
34 | help='VOC ,VOC0712++ or COCO dataset')
35 | parser.add_argument("--nms_thresh", dest = "nms_thresh", help = "NMS Threshhold", default = 0.4)
36 | parser.add_argument("--input_wh", dest = "input_wh", type=int, nargs=2, default = [416, 416])
37 | parser.add_argument("--weights", dest = 'weights',
38 | help = "weightsfile",
39 | default = "./weights/yolov3_COCO_epoches_10_0607.pth", type = str)
40 | parser.add_argument('--cuda', default=True, type=str,
41 | help='Use cuda to train model')
42 | parser.add_argument('--use_pad', default=True, type=str,
43 | help='Use pad to resize images')
44 | parser.add_argument('--retest', default=False, type=bool,
45 | help='test cache results')
46 | parser.add_argument('--save_folder', default='./eval/',
47 | help='results path')
48 | return parser.parse_args()
49 |
50 | def test_net(cfg, save_folder, input_wh, net, cuda, testset,
51 | max_per_image=300, thresh=0.05, nms_conf=0.4):
52 | """Test a Fast R-CNN network on an image database."""
53 | num_images = len(testset)
54 | # all detections are collected into:
55 | # all_boxes[cls][image] = N x 5 array of detections in
56 | # (x1, y1, x2, y2, score)
57 | num_images = len(testset)
58 | num_classes = cfg["num_classes"]
59 | all_boxes = [[[] for _ in range(num_images)]
60 | for _ in range(num_classes)]
61 |
62 | if not os.path.exists(save_folder):
63 | os.mkdir(save_folder)
64 | # timers
65 | _t = {'im_detect': Timer(), 'misc': Timer()}
66 | det_file = os.path.join(save_folder, 'detections.pkl')
67 |
68 | if args.retest:
69 | f = open(det_file,'rb')
70 | all_boxes = pickle.load(f)
71 | print('Evaluating detections')
72 | testset.evaluate_detections(all_boxes, save_folder)
73 | return
74 |
75 | for i in range(num_images):
76 | img, img_id = testset.pull_image(i)
77 | ori_wh = (img.shape[1], img.shape[0])
78 | img = preproc_for_test(img, input_wh, use_pad)
79 | x = img
80 | if cuda:
81 | x = x.cuda()
82 |
83 | _t['im_detect'].tic()
84 | out = net(x) # forward pass
85 | detections = detection_postprecess(out, thresh, num_classes, input_wh, ori_wh, use_pad=use_pad, nms_conf=nms_conf)
86 | boxes, scores, cls_inds = detections[:, :4], detections[:,4], detections[:, -1]
87 | detect_time = _t['im_detect'].toc()
88 | if len(boxes) == 0:
89 | continue
90 |
91 | _t['misc'].tic()
92 | for j in range(num_classes):
93 | inds = np.where(cls_inds == j)[0]
94 | if len(inds) == 0:
95 | all_boxes[j][i] = np.empty([0, 5], dtype=np.float32)
96 | continue
97 | c_bboxes = boxes[inds]
98 | c_scores = scores[inds]
99 | c_dets = np.hstack((c_bboxes, c_scores[:, np.newaxis])).astype(
100 | np.float32, copy=False)
101 | all_boxes[j][i] = c_dets
102 |
103 | if max_per_image > 0:
104 | image_scores = np.hstack([all_boxes[j][i][:, -1] for j in range(num_classes)])
105 | if len(image_scores) > max_per_image:
106 | image_thresh = np.sort(image_scores)[-max_per_image]
107 | for j in range(num_classes):
108 | keep = np.where(all_boxes[j][i][:, -1] >= image_thresh)[0]
109 | all_boxes[j][i] = all_boxes[j][i][keep, :]
110 | nms_time = _t['misc'].toc()
111 |
112 | if i % 20 == 0:
113 | print('im_detect: {:d}/{:d} {:.3f}s {:.3f}s'
114 | .format(i + 1, num_images, detect_time, nms_time))
115 | _t['im_detect'].clear()
116 | _t['misc'].clear()
117 |
118 | with open(det_file, 'wb') as f:
119 | pickle.dump(all_boxes, f, pickle.HIGHEST_PROTOCOL)
120 | print('Evaluating detections')
121 | testset.evaluate_detections(all_boxes, save_folder)
122 |
123 | if __name__ == '__main__':
124 | args = arg_parse()
125 | weightsfile = args.weights
126 | nms_thresh = args.nms_thresh
127 | input_wh = args.input_wh
128 | cuda = args.cuda
129 | use_pad = args.use_pad
130 | save_folder = args.save_folder
131 | dataset = args.dataset
132 | if dataset[0] == "V":
133 | cfg = voc_config
134 | test_dataset = VOCDetection(cfg["root"], datasets_dict["VOC2007"], input_wh)
135 | elif dataset[0] == "C":
136 | cfg = coco_config
137 | test_dataset = COCODetection(cfg["root"], datasets_dict["COCOval"], input_wh)
138 | else:
139 | print("only support VOC and COCO datasets !!!")
140 |
141 | print("load test_dataset successfully.....")
142 |
143 | with open(cfg["name_path"], "r") as f:
144 | classes = [i.strip() for i in f.readlines()]
145 |
146 | net = Yolov3("test", input_wh, cfg["anchors"], cfg["anchors_mask"], cfg["num_classes"])
147 | state_dict = torch.load(weightsfile)
148 | from collections import OrderedDict
149 | new_state_dict = OrderedDict()
150 | for k, v in state_dict.items():
151 | head = k[:7]
152 | if head == 'module.':
153 | name = k[7:] # remove `module.`
154 | else:
155 | name = k
156 | new_state_dict[name] = v
157 |
158 | if cuda:
159 | net.cuda()
160 | cudnn.benchmark = True
161 | net.load_state_dict(new_state_dict)
162 | print("load weights successfully.....")
163 | net.eval()
164 |
165 | top_k = 200
166 | confidence = 0.01
167 | test_net(cfg, save_folder, input_wh, net, args.cuda, test_dataset, top_k, confidence, nms_thresh)
168 |
169 |
170 |
171 |
172 |
173 |
174 |
175 |
176 |
--------------------------------------------------------------------------------
/images/dog.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/images/dog.jpg
--------------------------------------------------------------------------------
/images/eagle.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/images/eagle.jpg
--------------------------------------------------------------------------------
/images/person.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/images/person.jpg
--------------------------------------------------------------------------------
/layers/multiyolo_loss.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | # Written by yq_yao
3 | #
4 | import torch
5 | import torch.nn as nn
6 | import torch.nn.functional as F
7 | from torch.autograd import Variable
8 | import numpy as np
9 | from .weight_mseloss import WeightMseLoss
10 | from utils.box_utils import targets_match_all, permute_sigmoid, decode
11 |
12 | class MultiYoloLoss(nn.Module):
13 |
14 | def __init__(self, input_wh, num_classes, ignore_thresh, anchors, anchors_mask, use_gpu=True):
15 | super(MultiYoloLoss, self).__init__()
16 | self.num_classes = num_classes
17 | self.ignore_thresh = ignore_thresh
18 | self.use_gpu = use_gpu
19 | self.anchors = anchors
20 | self.mse_loss = nn.MSELoss(size_average=False)
21 | self.bce_loss = nn.BCELoss(size_average=False)
22 | self.weight_mseloss = WeightMseLoss(size_average=False)
23 | self.input_wh = input_wh
24 | self.anchors_mask = anchors_mask
25 |
26 | def forward(self, x, targets, input_wh, debug=False):
27 | self.input_wh = input_wh
28 | l_data, m_data, h_data = x
29 | l_grid_wh = (l_data.size(3), l_data.size(2))
30 | m_grid_wh = (m_data.size(3), m_data.size(2))
31 | h_grid_wh = (h_data.size(3), h_data.size(2))
32 | feature_dim = (l_grid_wh, m_grid_wh, h_grid_wh)
33 | batch_size = l_data.size(0)
34 | pred_l, stride_l = permute_sigmoid(l_data, self.input_wh, 3, self.num_classes)
35 | pred_m, stride_m = permute_sigmoid(m_data, self.input_wh, 3, self.num_classes)
36 | pred_h, stride_h = permute_sigmoid(h_data, self.input_wh, 3, self.num_classes)
37 | pred = torch.cat((pred_l, pred_m, pred_h), 1)
38 |
39 | anchors1 = self.anchors[self.anchors_mask[0][0]: self.anchors_mask[0][-1]+1]
40 | anchors2 = self.anchors[self.anchors_mask[1][0]: self.anchors_mask[1][-1]+1]
41 | anchors3 = self.anchors[self.anchors_mask[2][0]: self.anchors_mask[2][-1]+1]
42 |
43 | decode_l = decode(pred_l.new_tensor(pred_l).detach(), self.input_wh, anchors1, self.num_classes, stride_l)
44 | decode_m = decode(pred_m.new_tensor(pred_m).detach(), self.input_wh, anchors2, self.num_classes, stride_m)
45 | decode_h = decode(pred_h.new_tensor(pred_h).detach(), self.input_wh, anchors3, self.num_classes, stride_h)
46 | decode_pred = torch.cat((decode_l, decode_m, decode_h), 1)
47 |
48 | num_pred = pred_l.size(1) + pred_m.size(1) + pred_h.size(1)
49 |
50 | # prediction targets x,y,w,h,objectness, class
51 | pred_t = torch.Tensor(batch_size, num_pred, 6).cuda()
52 | # xywh scale, scale = 2 - truth.w * truth.h (if truth is normlized to 1)
53 | scale_t = torch.FloatTensor(batch_size, num_pred).cuda()
54 | # foreground targets mask
55 | fore_mask_t = torch.ByteTensor(batch_size, num_pred).cuda()
56 |
57 | # background targets mask, we only calculate the objectness pred loss
58 | back_mask_t = torch.ByteTensor(batch_size, num_pred).cuda()
59 |
60 | for idx in range(batch_size):
61 | # match all targets
62 | targets_match_all(self.input_wh, self.ignore_thresh, targets[idx], decode_pred[idx][:, :4], self.anchors, feature_dim, pred_t, scale_t, fore_mask_t, back_mask_t, num_pred, idx)
63 |
64 | scale_factor = scale_t[fore_mask_t].view(-1, 1)
65 | scale_factor = scale_factor.expand((scale_factor.size(0), 2))
66 | cls_t = pred_t[..., 5][fore_mask_t].long().view(-1, 1)
67 | cls_pred = pred[..., 5:]
68 |
69 | # cls loss
70 | cls_fore_mask_t = fore_mask_t.new_tensor(fore_mask_t).view(batch_size, num_pred, 1).expand_as(cls_pred)
71 | cls_pred = cls_pred[cls_fore_mask_t].view(-1, self.num_classes)
72 | class_mask = cls_pred.data.new(cls_t.size(0), self.num_classes).fill_(0)
73 | class_mask.scatter_(1, cls_t, 1.)
74 | cls_loss = self.bce_loss(cls_pred, class_mask)
75 | ave_cls = (class_mask * cls_pred).sum().item() / cls_pred.size(0)
76 |
77 | # conf loss
78 | conf_t = pred_t[..., 4]
79 | fore_conf_t = conf_t[fore_mask_t].view(-1, 1)
80 | back_conf_t = conf_t[back_mask_t].view(-1, 1)
81 | fore_conf_pred = pred[..., 4][fore_mask_t].view(-1, 1)
82 | back_conf_pred = pred[..., 4][back_mask_t].view(-1, 1)
83 | fore_num = fore_conf_pred.size(0)
84 | back_num = back_conf_pred.size(0)
85 | Obj = fore_conf_pred.sum().item() / fore_num
86 | no_obj = back_conf_pred.sum().item() / back_num
87 |
88 | fore_conf_loss = self.bce_loss(fore_conf_pred, fore_conf_t)
89 | back_conf_loss = self.bce_loss(back_conf_pred, back_conf_t)
90 | conf_loss = fore_conf_loss + back_conf_loss
91 |
92 | # loc loss
93 | loc_pred = pred[..., :4]
94 | loc_t = pred_t[..., :4]
95 | fore_mask_t = fore_mask_t.view(batch_size, num_pred, 1).expand_as(loc_pred)
96 | loc_t = loc_t[fore_mask_t].view(-1, 4)
97 | loc_pred = loc_pred[fore_mask_t].view(-1, 4)
98 |
99 | xy_t, wh_t = loc_t[:, :2], loc_t[:, 2:]
100 | xy_pred, wh_pred = loc_pred[:, :2], loc_pred[:, 2:]
101 | # xy_loss = F.binary_cross_entropy(xy_pred, xy_t, scale_factor, size_average=False)
102 |
103 | xy_loss = self.weight_mseloss(xy_pred, xy_t, scale_factor) / 2
104 | wh_loss = self.weight_mseloss(wh_pred, wh_t, scale_factor) / 2
105 |
106 | loc_loss = xy_loss + wh_loss
107 |
108 | loc_loss /= batch_size
109 | conf_loss /= batch_size
110 | cls_loss /= batch_size
111 |
112 | if debug:
113 | print("xy_loss", round(xy_loss.item(), 5), "wh_loss", round(wh_loss.item(), 5), "cls_loss", round(cls_loss.item(), 5), "ave_cls", round(ave_cls, 5), "Obj", round(Obj, 5), "no_obj", round(no_obj, 5), "fore_conf_loss", round(fore_conf_loss.item(), 5),
114 | "back_conf_loss", round(back_conf_loss.item(), 5))
115 |
116 | loss = loc_loss + conf_loss + cls_loss
117 |
118 | return loss
119 |
120 |
121 |
122 |
123 |
124 |
125 |
126 |
--------------------------------------------------------------------------------
/layers/weight_mseloss.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | # Written by yq_yao
3 |
4 |
5 | import torch
6 | import torch.nn as nn
7 | import torch.nn.functional as F
8 | from torch.autograd import Variable
9 |
10 |
11 | class WeightMseLoss(nn.Module):
12 | def __init__(self, size_average=True):
13 | super(WeightMseLoss, self).__init__()
14 | self.size_average = size_average
15 |
16 | def forward(self, inputs, targets, weights):
17 | ''' inputs is N * C
18 | targets is N * C
19 | weights is N * C
20 | '''
21 | N = inputs.size(0)
22 | C = inputs.size(1)
23 |
24 | out = (targets - inputs)
25 | out = weights * torch.pow(out, 2)
26 | loss = out.sum()
27 |
28 | if self.size_average:
29 | loss = loss / (N * C)
30 | return loss
31 |
--------------------------------------------------------------------------------
/layers/yolo_layer.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | # Written by yq_yao
3 | #
4 | import torch
5 | import torch.nn as nn
6 | import torch.nn.functional as F
7 | from torch.autograd import Variable
8 | import numpy as np
9 | import math
10 | from .weight_mseloss import WeightMseLoss
11 | from utils.box_utils import targets_match_single, permute_sigmoid, decode
12 |
13 | class YoloLayer(nn.Module):
14 |
15 | def __init__(self, input_wh, num_classes, ignore_thresh, anchors, anchors_mask,use_gpu=True):
16 | super(YoloLayer, self).__init__()
17 | self.num_classes = num_classes
18 | self.ignore_thresh = ignore_thresh
19 | self.use_gpu = use_gpu
20 | self.anchors = anchors
21 | self.anchors_mask = anchors_mask
22 | self.input_wh = input_wh
23 | self.mse_loss = nn.MSELoss(size_average=False)
24 | self.bce_loss = nn.BCELoss(size_average=False)
25 | self.weight_mseloss = WeightMseLoss(size_average=False)
26 |
27 | def forward(self, x, targets, input_wh, debug=False):
28 | self.input_wh = input_wh
29 | batch_size = x.size(0)
30 | # feature map size w, h, this produce wxh cells to predict
31 | grid_wh = (x.size(3), x.size(2))
32 | x, stride = permute_sigmoid(x, input_wh, 3, self.num_classes)
33 | pred = x
34 | num_pred = pred.size(1)
35 |
36 | decode_pred = decode(pred.new_tensor(pred).detach(), self.input_wh, self.anchors[self.anchors_mask[0]: self.anchors_mask[-1]+1], self.num_classes, stride)
37 |
38 | # prediction targets x,y,w,h,objectness, class
39 | pred_t = torch.Tensor(batch_size, num_pred, 6).cuda()
40 | # xywh scale, scale = 2 - truth.w * truth.h (if truth is normlized to 1)
41 | scale_t = torch.FloatTensor(batch_size, num_pred).cuda()
42 | # foreground targets mask
43 | fore_mask_t = torch.ByteTensor(batch_size, num_pred).cuda()
44 |
45 | # background targets mask, we only calculate the objectness pred loss
46 | back_mask_t = torch.ByteTensor(batch_size, num_pred).cuda()
47 |
48 | for idx in range(batch_size):
49 | # match our targets
50 | targets_match_single(self.input_wh, self.ignore_thresh, targets[idx], decode_pred[idx][:, :4], self.anchors, self.anchors_mask, pred_t, scale_t, fore_mask_t, back_mask_t, grid_wh, idx)
51 |
52 | cls_t = pred_t[..., 5][fore_mask_t].long().view(-1, 1)
53 | cls_pred = pred[..., 5:]
54 | conf_t = pred_t[..., 4]
55 | if cls_t.size(0) == 0:
56 | print("grid_wh {} no matching anchors".format(grid_wh))
57 | back_conf_t = conf_t[back_mask_t].view(-1, 1)
58 | back_conf_pred = pred[..., 4][back_mask_t].view(-1, 1)
59 | back_num = back_conf_pred.size(0)
60 | no_obj = back_conf_pred.sum().item() / back_num
61 | back_conf_loss = self.bce_loss(back_conf_pred, back_conf_t)
62 | if debug:
63 | print("grid_wh", grid_wh, "loc_loss", 0, "conf_loss", round(back_conf_loss.item(), 5), "cls_loss", 0, "Obj", 0, "no_obj", round(no_obj, 5))
64 | return torch.zeros(1), back_conf_loss, torch.zeros(1)
65 |
66 | scale_factor = scale_t[fore_mask_t].view(-1, 1)
67 | scale_factor = scale_factor.expand((scale_factor.size(0), 2))
68 |
69 | # cls loss
70 | cls_fore_mask_t = fore_mask_t.new_tensor(fore_mask_t).view(batch_size, num_pred, 1).expand_as(cls_pred)
71 | cls_pred = cls_pred[cls_fore_mask_t].view(-1, self.num_classes)
72 | class_mask = cls_pred.data.new(cls_t.size(0), self.num_classes).fill_(0)
73 | class_mask.scatter_(1, cls_t, 1.)
74 | cls_loss = self.bce_loss(cls_pred, class_mask)
75 | ave_cls = (class_mask * cls_pred).sum().item() / cls_pred.size(0)
76 |
77 | # conf loss
78 | fore_conf_t = conf_t[fore_mask_t].view(-1, 1)
79 | back_conf_t = conf_t[back_mask_t].view(-1, 1)
80 | fore_conf_pred = pred[..., 4][fore_mask_t].view(-1, 1)
81 | back_conf_pred = pred[..., 4][back_mask_t].view(-1, 1)
82 | fore_num = fore_conf_pred.size(0)
83 | back_num = back_conf_pred.size(0)
84 | Obj = fore_conf_pred.sum().item() / fore_num
85 | no_obj = back_conf_pred.sum().item() / back_num
86 |
87 | fore_conf_loss = self.bce_loss(fore_conf_pred, fore_conf_t)
88 | back_conf_loss = self.bce_loss(back_conf_pred, back_conf_t)
89 | conf_loss = fore_conf_loss + back_conf_loss
90 |
91 | # loc loss
92 | loc_pred = pred[..., :4]
93 | loc_t = pred_t[..., :4]
94 | fore_mask_t = fore_mask_t.view(batch_size, num_pred, 1).expand_as(loc_pred)
95 | loc_t = loc_t[fore_mask_t].view(-1, 4)
96 | loc_pred = loc_pred[fore_mask_t].view(-1, 4)
97 |
98 | xy_t, wh_t = loc_t[:, :2], loc_t[:, 2:]
99 | xy_pred, wh_pred = loc_pred[:, :2], loc_pred[:, 2:]
100 | # xy_loss = F.binary_cross_entropy(xy_pred, xy_t, scale_factor, size_average=False)
101 |
102 | xy_loss = self.weight_mseloss(xy_pred, xy_t, scale_factor) / 2
103 | wh_loss = self.weight_mseloss(wh_pred, wh_t, scale_factor) / 2
104 |
105 | loc_loss = xy_loss + wh_loss
106 |
107 | loc_loss /= batch_size
108 | conf_loss /= batch_size
109 | cls_loss /= batch_size
110 |
111 | if debug:
112 | print("grid_wh", grid_wh, "xy_loss", round(xy_loss.item(), 5), "wh_loss", round(wh_loss.item(), 5), "cls_loss", round(cls_loss.item(), 5), "ave_cls", round(ave_cls, 5), "Obj", round(Obj, 5), "no_obj", round(no_obj, 5), "fore_conf_loss", round(fore_conf_loss.item(), 5),
113 | "back_conf_loss", round(back_conf_loss.item(), 5))
114 |
115 | return loc_loss, conf_loss, cls_loss
116 |
117 |
118 |
119 |
120 |
121 |
122 |
123 |
124 |
125 |
126 |
127 |
128 |
129 |
130 |
131 |
132 |
133 |
--------------------------------------------------------------------------------
/layers/yolo_loss.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | # Written by yq_yao
3 | #
4 | import torch
5 | import torch.nn as nn
6 | import torch.nn.functional as F
7 | from torch.autograd import Variable
8 | import torch.nn.init as init
9 | import os
10 | from layers.yolo_layer import YoloLayer
11 |
12 |
13 | class YoloLoss(nn.Module):
14 | def __init__(self, input_wh, num_classes, ignore_thresh, anchors, anchors_mask, use_gpu=True):
15 | super(YoloLoss, self).__init__()
16 | self.input_wh = input_wh
17 | self.num_classes = num_classes
18 | self.ignore_thresh = ignore_thresh
19 | self.use_gpu = use_gpu
20 | self.anchors = anchors
21 | self.anchors_mask = anchors_mask
22 | self.yolo_layer1 = YoloLayer(input_wh, num_classes, ignore_thresh, anchors, anchors_mask[0])
23 | self.yolo_layer2 = YoloLayer(input_wh, num_classes, ignore_thresh, anchors, anchors_mask[1])
24 | self.yolo_layer3 = YoloLayer(input_wh, num_classes, ignore_thresh, anchors, anchors_mask[2])
25 |
26 | def forward(self, inputs, targets, input_wh, debug):
27 | self.input_wh = input_wh
28 | x, y, z = inputs
29 | batch_size = x.size(0)
30 | loc_loss1, conf_loss1, cls_loss1 = self.yolo_layer1(x, targets, self.input_wh, debug)
31 | loc_loss2, conf_loss2, cls_loss2 = self.yolo_layer2(y, targets, self.input_wh, debug)
32 | loc_loss3, conf_loss3, cls_loss3 = self.yolo_layer3(z, targets, self.input_wh, debug)
33 | loc_loss = loc_loss1 + loc_loss2 + loc_loss3
34 | conf_loss = conf_loss1 + conf_loss2 + conf_loss3
35 | cls_loss = cls_loss1 + cls_loss2 + cls_loss3
36 | loss = loc_loss + conf_loss + cls_loss
37 | return loss
--------------------------------------------------------------------------------
/make.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 | cd ./utils/
3 |
4 | CUDA_PATH=/usr/local/cuda/
5 |
6 | python build.py build_ext --inplace
7 | # if you use anaconda3 maybe you need add this
8 | mv nms/cpu_nms.cpython-36m-x86_64-linux-gnu.so nms/cpu_nms.so
9 | mv nms/gpu_nms.cpython-36m-x86_64-linux-gnu.so nms/gpu_nms.so
10 | cd ..
11 |
--------------------------------------------------------------------------------
/model/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/model/__init__.py
--------------------------------------------------------------------------------
/model/darknet53.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | # Written by yq_yao
3 | #
4 | import torch
5 | import torch.nn as nn
6 | import torch.nn.functional as F
7 | from torch.autograd import Variable
8 |
9 | class ConvBN(nn.Module):
10 | def __init__(self, ch_in, ch_out, kernel_size=3, stride=1, padding=0):
11 | super().__init__()
12 | self.conv = nn.Conv2d(ch_in, ch_out, kernel_size=kernel_size, stride=stride, padding=padding, bias=False)
13 | self.bn = nn.BatchNorm2d(ch_out, momentum=0.01, eps=1e-05, affine=True)
14 |
15 | def forward(self, x):
16 | return F.leaky_relu(self.bn(self.conv(x)), negative_slope=0.1, inplace=True)
17 |
18 | class DarknetBlock(nn.Module):
19 | def __init__(self, ch_in):
20 | super().__init__()
21 | ch_hid = ch_in // 2
22 | self.conv1 = ConvBN(ch_in, ch_hid, kernel_size=1, stride=1, padding=0)
23 | self.conv2 = ConvBN(ch_hid, ch_in, kernel_size=3, stride=1, padding=1)
24 |
25 | def forward(self, x):
26 | out = self.conv1(x)
27 | out = self.conv2(out)
28 | return out + x
29 |
30 | class Darknet19(nn.Module):
31 | def __init__(self, size):
32 | super().__init__()
33 | self.conv = ConvBN(3, 32, kernel_size=3, stride=1, padding=1)
34 | self.layer1 = self._make_layer1()
35 | self.layer2 = self._make_layer2()
36 | self.layer3 = self._make_layer3()
37 | self.layer4 = self._make_layer4()
38 | self.layer5 = self._make_layer5()
39 |
40 | def _make_layer1(self):
41 | layers = [nn.MaxPool2d(kernel_size=2, stride=2),
42 | ConvBN(32, 64, kernel_size=3, stride=1, padding=1)]
43 | return nn.Sequential(*layers)
44 |
45 | def _make_layer2(self):
46 | layers = [nn.MaxPool2d(kernel_size=2, stride=2),
47 | ConvBN(64, 128, kernel_size=3, stride=1, padding=1),
48 | ConvBN(128, 64, kernel_size=1, stride=1, padding=1),
49 | ConvBN(64, 128, kernel_size=3, stride=1, padding=1)]
50 | return nn.Sequential(*layers)
51 |
52 | def _make_layer3(self):
53 | layers = [nn.MaxPool2d(kernel_size=2, stride=2),
54 | ConvBN(128, 256, kernel_size=3, stride=1, padding=1),
55 | ConvBN(256, 128, kernel_size=1, stride=1, padding=1),
56 | ConvBN(128, 256, kernel_size=3, stride=1, padding=1)]
57 | return nn.Sequential(*layers)
58 |
59 | def _make_layer4(self):
60 | layers = [nn.MaxPool2d(kernel_size=2, stride=2),
61 | ConvBN(256, 512, kernel_size=3, stride=1, padding=1),
62 | ConvBN(512, 256, kernel_size=1, stride=1, padding=1),
63 | ConvBN(256, 512, kernel_size=3, stride=1, padding=1),
64 | ConvBN(512, 256, kernel_size=1, stride=1, padding=1),
65 | ConvBN(256, 512, kernel_size=3, stride=1, padding=1)]
66 | return nn.Sequential(*layers)
67 |
68 | def _make_layer5(self):
69 | layers = [nn.MaxPool2d(kernel_size=2, stride=2),
70 | ConvBN(512, 1024, kernel_size=3, stride=1, padding=1),
71 | ConvBN(1024, 512, kernel_size=1, stride=1, padding=1),
72 | ConvBN(512, 1024, kernel_size=3, stride=1, padding=1),
73 | ConvBN(1024, 512, kernel_size=1, stride=1, padding=1),
74 | ConvBN(512, 1024, kernel_size=3, stride=1, padding=1)]
75 | return nn.Sequential(*layers)
76 |
77 | def forward(self, x):
78 | out = self.conv(x)
79 |
80 | c1 = self.layer1(out)
81 | c2 = self.layer2(c1)
82 | c3 = self.layer3(c2)
83 | c4 = self.layer4(c3)
84 | c5 = self.layer5(c4)
85 | return (c3, c4, c5)
86 |
87 |
88 | class Darknet53(nn.Module):
89 | def __init__(self, num_blocks):
90 | super().__init__()
91 | self.conv = ConvBN(3, 32, kernel_size=3, stride=1, padding=1)
92 | self.layer1 = self._make_layer(32, num_blocks[0], stride=2)
93 | self.layer2 = self._make_layer(64, num_blocks[1], stride=2)
94 | self.layer3 = self._make_layer(128, num_blocks[2], stride=2)
95 | self.layer4 = self._make_layer(256, num_blocks[3], stride=2)
96 | self.layer5 = self._make_layer(512, num_blocks[4], stride=2)
97 |
98 | def _make_layer(self, ch_in, num_blocks, stride=1):
99 | layers = [ConvBN(ch_in, ch_in*2, stride=stride, padding=1)]
100 | for i in range(num_blocks):
101 | layers.append(DarknetBlock(ch_in * 2))
102 | return nn.Sequential(*layers)
103 |
104 | def forward(self, x):
105 | out = self.conv(x)
106 | c1 = self.layer1(out)
107 | c2 = self.layer2(c1)
108 | c3 = self.layer3(c2)
109 | c4 = self.layer4(c3)
110 | c5 = self.layer5(c4)
111 | return (c3, c4, c5)
112 |
113 |
--------------------------------------------------------------------------------
/model/yolo.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | # Written by yq_yao
3 | #
4 | import torch
5 | import torch.nn as nn
6 | import torch.nn.functional as F
7 | from torch.autograd import Variable
8 | import torch.nn.init as init
9 | from model.darknet53 import Darknet53
10 | import os
11 | from utils.box_utils import permute_sigmoid, decode
12 | from layers.yolo_layer import YoloLayer
13 |
14 | def xavier(param):
15 | init.xavier_uniform(param)
16 |
17 | # kaiming_weights_init
18 | def weights_init(m):
19 | for key in m.state_dict():
20 | if key.split('.')[-1] == 'weight':
21 | if 'conv' in key:
22 | init.kaiming_normal_(m.state_dict()[key], mode='fan_out')
23 | if 'bn' in key:
24 | m.state_dict()[key][...] = 1
25 | elif key.split('.')[-1] == 'bias':
26 | m.state_dict()[key][...] = 0
27 |
28 |
29 | # def weights_init(m):
30 | # for key in m.state_dict():
31 | # if key.split('.')[-1] == 'weight':
32 | # if 'conv' in key:
33 | # init.xavier_uniform(m.state_dict()[key])
34 | # if 'bn' in key:
35 | # m.state_dict()[key][...] = 1
36 | # elif key.split('.')[-1] == 'bias':
37 | # m.state_dict()[key][...] = 0
38 |
39 | class ConvBN(nn.Module):
40 | def __init__(self, ch_in, ch_out, kernel_size=3, stride=1, padding=0):
41 | super().__init__()
42 | self.conv = nn.Conv2d(ch_in, ch_out, kernel_size=kernel_size, stride=stride, padding=padding, bias=False)
43 | self.bn = nn.BatchNorm2d(ch_out, momentum=0.01)
44 |
45 | def forward(self, x):
46 | return F.leaky_relu(self.bn(self.conv(x)), negative_slope=0.1, inplace=True)
47 |
48 | class DetectionLayer(nn.Module):
49 | def __init__(self, anchors, anchors_mask, input_wh, num_classes):
50 | super(DetectionLayer, self).__init__()
51 | self.anchors = anchors
52 | self.input_wh = input_wh
53 | self.anchors_mask = anchors_mask
54 | self.num_classes = num_classes
55 |
56 | def forward(self, x):
57 | l_data, m_data, h_data = x
58 | l_grid_wh = (l_data.size(3), l_data.size(2))
59 | m_grid_wh = (m_data.size(3), m_data.size(2))
60 | h_grid_wh = (h_data.size(3), h_data.size(2))
61 |
62 | pred_l, stride_l = permute_sigmoid(l_data, self.input_wh, 3, self.num_classes)
63 | pred_m, stride_m = permute_sigmoid(m_data, self.input_wh, 3, self.num_classes)
64 | pred_h, stride_h = permute_sigmoid(h_data, self.input_wh, 3, self.num_classes)
65 |
66 | anchors1 = self.anchors[self.anchors_mask[0][0]: self.anchors_mask[0][-1]+1]
67 | anchors2 = self.anchors[self.anchors_mask[1][0]: self.anchors_mask[1][-1]+1]
68 | anchors3 = self.anchors[self.anchors_mask[2][0]: self.anchors_mask[2][-1]+1]
69 |
70 | decode_l = decode(pred_l.detach(), self.input_wh, anchors1, self.num_classes, stride_l)
71 | decode_m = decode(pred_m.detach(), self.input_wh, anchors2, self.num_classes, stride_m)
72 | decode_h = decode(pred_h.detach(), self.input_wh, anchors3, self.num_classes, stride_h)
73 | decode_pred = torch.cat((decode_l, decode_m, decode_h), 1)
74 |
75 | return decode_pred
76 |
77 | def predict_conv_list1(num_classes):
78 | layers = list()
79 | layers += [ConvBN(1024, 512, kernel_size=1, stride=1, padding=0)]
80 | layers += [ConvBN(512, 1024, kernel_size=3, stride=1, padding=1)]
81 | layers += [ConvBN(1024, 512, kernel_size=1, stride=1, padding=0)]
82 | layers += [ConvBN(512, 1024, kernel_size=3, stride=1, padding=1)]
83 | layers += [ConvBN(1024, 512, kernel_size=1, stride=1, padding=0)]
84 | layers += [ConvBN(512, 1024, kernel_size=3, stride=1, padding=1)]
85 | layers += [nn.Conv2d(1024, (5 + num_classes) * 3, kernel_size=1, stride=1, padding=0)]
86 | return layers
87 |
88 | def predict_conv_list2(num_classes):
89 | layers = list()
90 | layers += [ConvBN(768, 256, kernel_size=1, stride=1, padding=0)]
91 | layers += [ConvBN(256, 512, kernel_size=3, stride=1, padding=1)]
92 | layers += [ConvBN(512, 256, kernel_size=1, stride=1, padding=0)]
93 | layers += [ConvBN(256, 512, kernel_size=3, stride=1, padding=1)]
94 | layers += [ConvBN(512, 256, kernel_size=1, stride=1, padding=0)]
95 | layers += [ConvBN(256, 512, kernel_size=3, stride=1, padding=1)]
96 | layers += [nn.Conv2d(512, (5 + num_classes) * 3, kernel_size=1, stride=1, padding=0)]
97 | return layers
98 |
99 | def predict_conv_list3(num_classes):
100 | layers = list()
101 | layers += [ConvBN(384, 128, kernel_size=1, stride=1, padding=0)]
102 | layers += [ConvBN(128, 256, kernel_size=3, stride=1, padding=1)]
103 | layers += [ConvBN(256, 128, kernel_size=1, stride=1, padding=0)]
104 | layers += [ConvBN(128, 256, kernel_size=3, stride=1, padding=1)]
105 | layers += [ConvBN(256, 128, kernel_size=1, stride=1, padding=0)]
106 | layers += [ConvBN(128, 256, kernel_size=3, stride=1, padding=1)]
107 | layers += [nn.Conv2d(256, (5 + num_classes) * 3, kernel_size=1, stride=1, padding=0)]
108 | return layers
109 |
110 | class YOLOv3(nn.Module):
111 | def __init__(self, phase, num_blocks, anchors, anchors_mask, input_wh, num_classes):
112 | super().__init__()
113 | self.phase = phase
114 | self.extractor = Darknet53(num_blocks)
115 | self.predict_conv_list1 = nn.ModuleList(predict_conv_list1(num_classes))
116 | self.smooth_conv1 = ConvBN(512, 256, kernel_size=1, stride=1, padding=0)
117 | self.predict_conv_list2 = nn.ModuleList(predict_conv_list2(num_classes))
118 | self.smooth_conv2 = ConvBN(256, 128, kernel_size=1, stride=1, padding=0)
119 | self.predict_conv_list3 = nn.ModuleList(predict_conv_list3(num_classes))
120 | if phase == "test":
121 | self.detection = DetectionLayer(anchors, anchors_mask, input_wh, num_classes)
122 |
123 | def forward(self, x, targets=None):
124 | c3, c4, c5 = self.extractor(x)
125 | x = c5
126 | # predict_list1
127 | for i in range(5):
128 | x = self.predict_conv_list1[i](x)
129 | smt1 = self.smooth_conv1(x)
130 | smt1 = F.upsample(smt1, scale_factor=2, mode='nearest')
131 |
132 | smt1 = torch.cat((smt1, c4), 1)
133 | for i in range(5, 7):
134 | x = self.predict_conv_list1[i](x)
135 | out1 = x
136 |
137 | x = smt1
138 | for i in range(5):
139 | x = self.predict_conv_list2[i](x)
140 | smt2 = self.smooth_conv2(x)
141 | smt2 = F.upsample(smt2, scale_factor=2, mode='nearest')
142 | smt2 = torch.cat((smt2, c3), 1)
143 | for i in range(5, 7):
144 | x = self.predict_conv_list2[i](x)
145 | out2 = x
146 | x = smt2
147 | for i in range(7):
148 | x = self.predict_conv_list3[i](x)
149 | out3 = x
150 |
151 | if self.phase == "test":
152 | detections = self.detection((out1, out2, out3))
153 | return detections
154 | elif self.phase == "train":
155 | detections = (out1, out2, out3)
156 | return detections
157 |
158 | def load_weights(self, base_file):
159 | other, ext = os.path.splitext(base_file)
160 | if ext == '.pkl' or '.pth':
161 | print('Loading weights into state dict...')
162 | self.extractor.load_state_dict(torch.load(base_file))
163 | print("initing darknet53 ......")
164 | self.predict_conv_list1.apply(weights_init)
165 | self.smooth_conv1.apply(weights_init)
166 | self.predict_conv_list2.apply(weights_init)
167 | self.smooth_conv2.apply(weights_init)
168 | self.predict_conv_list3.apply(weights_init)
169 | print('Finished!')
170 | else:
171 | print('Sorry only .pth and .pkl files supported.')
172 |
173 | def Yolov3(phase, input_wh, anchors, anchors_mask, num_classes):
174 | num_blocks = [1,2,8,8,4]
175 | return YOLOv3(phase, num_blocks, anchors, anchors_mask, input_wh, num_classes)
176 |
--------------------------------------------------------------------------------
/output/output_dog.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/output/output_dog.jpg
--------------------------------------------------------------------------------
/output/output_eagle.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/output/output_eagle.jpg
--------------------------------------------------------------------------------
/output/output_person.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/output/output_person.jpg
--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | # Written by yq_yao
3 |
4 | import os
5 | os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
6 | import torch
7 | import torch.nn as nn
8 | import torch.optim as optim
9 | import torch.backends.cudnn as cudnn
10 | import torch.nn.init as init
11 | import argparse
12 | import torch.utils.data as data
13 | from data.voc0712 import VOCDetection, detection_collate
14 | from data.coco import COCODetection
15 | from model.yolo import Yolov3
16 | from data.config import voc_config, coco_config
17 | from layers.yolo_loss import YoloLoss
18 | from layers.multiyolo_loss import MultiYoloLoss
19 | import numpy as np
20 | import time
21 | import os
22 | import sys
23 |
24 |
25 | def arg_parse():
26 | """
27 | Parse arguments to the train module
28 | """
29 | parser = argparse.ArgumentParser(
30 | description='Yolov3 pytorch Training')
31 | parser.add_argument('-v', '--version', default='yolov3',
32 | help='')
33 | parser.add_argument("--input_wh", dest = "input_wh", type=int, nargs=2, default = [416, 416])
34 | parser.add_argument('-d', '--dataset', default='VOC',
35 | help='VOC or COCO dataset')
36 | parser.add_argument('-b', '--batch_size', default=64,
37 | type=int, help='Batch size for training')
38 | parser.add_argument('--basenet', default='./weights/convert_darknet53.pth', help='pretrained base model')
39 | parser.add_argument('--ignore_thresh', default=0.5,
40 | type=float, help='ignore_thresh')
41 | parser.add_argument('--subdivisions', default=4,
42 | type=int, help='subdivisions for large batch_size')
43 | parser.add_argument('--num_workers', default=4,
44 | type=int, help='Number of workers used in dataloading')
45 | parser.add_argument('--cuda', default=True,
46 | type=bool, help='Use cuda to train model')
47 | parser.add_argument('--merge_yolo_loss', default=True,
48 | type=bool, help='merge yolo loss')
49 | parser.add_argument('--lr', '--learning-rate',
50 | default=1e-3, type=float, help='initial learning rate')
51 | parser.add_argument('--ngpu', default=2, type=int, help='gpus')
52 |
53 | parser.add_argument('--resume_net', default=None,
54 | help='resume net for retraining')
55 | parser.add_argument('--resume_epoch', default=0,
56 | type=int, help='resume iter for retraining')
57 | parser.add_argument('-max','--max_epoch', default=200,
58 | type=int, help='max epoch for retraining')
59 | parser.add_argument('--save_folder', default='./weights/',
60 | help='Location to save checkpoint models')
61 |
62 | return parser.parse_args()
63 |
64 | def adjust_learning_rate(optimizer, gamma, epoch, step_index, iteration, epoch_size):
65 | """Sets the learning rate
66 | # Adapted from PyTorch Imagenet example:
67 | # https://github.com/pytorch/examples/blob/master/imagenet/main.py
68 | """
69 | if iteration < 1000:
70 | # warm up training
71 | lr = 0.001 * pow((iteration)/1000, 4)
72 | else:
73 | lr = args.lr * (gamma ** (step_index))
74 | for param_group in optimizer.param_groups:
75 | param_group['lr'] = lr
76 | return lr
77 |
78 |
79 | if __name__ == '__main__':
80 | args = arg_parse()
81 | basenet = args.basenet
82 | save_folder = args.save_folder
83 | input_wh = args.input_wh
84 | batch_size = args.batch_size
85 | weight_decay = 0.0005
86 | gamma = 0.1
87 | momentum = 0.9
88 | cuda = args.cuda
89 | dataset_name = args.dataset
90 | subdivisions = args.subdivisions
91 | ignore_thresh = args.ignore_thresh
92 | merge_yolo_loss = args.merge_yolo_loss
93 | if not os.path.exists(save_folder):
94 | os.mkdir(save_folder)
95 | if cuda and torch.cuda.is_available():
96 | torch.set_default_tensor_type('torch.cuda.FloatTensor')
97 | else:
98 | torch.set_default_tensor_type('torch.FloatTensor')
99 |
100 | # different datasets, include coco, voc0712 trainval, coco val
101 | datasets_version = {"VOC": [('0712', '0712_trainval')],
102 | "VOC0712++": [('0712', '0712_trainval_test')],
103 | "VOC2012" : [('2012', '2012_trainval')],
104 | "COCO": [('2014', 'train'), ('2014', 'valminusminival')],
105 | "VOC2007": [('0712', "2007_test")],
106 | "COCOval": [('2014', 'minival')]}
107 |
108 | print('Loading Dataset...')
109 | if dataset_name[0] == "V":
110 | cfg = voc_config
111 | train_dataset = VOCDetection(cfg["root"], datasets_version[dataset_name], input_wh, batch_size, cfg["multiscale"], dataset_name)
112 | elif dataset_name[0] == "C":
113 | cfg = coco_config
114 | train_dataset = COCODetection(cfg["root"], datasets_version[dataset_name], input_wh, batch_size, cfg["multiscale"], dataset_name)
115 | else:
116 | print('Unkown dataset!')
117 |
118 | # load Yolov3 net
119 | net = Yolov3("train", input_wh, cfg["anchors"], cfg["anchors_mask"], cfg["num_classes"])
120 | if args.resume_net == None:
121 | net.load_weights(basenet)
122 | else:
123 | state_dict = torch.load(args.resume_net)
124 | from collections import OrderedDict
125 | new_state_dict = OrderedDict()
126 | for k, v in state_dict.items():
127 | head = k[:7]
128 | if head == 'module.':
129 | name = k[7:] # remove `module.`
130 | else:
131 | name = k
132 | new_state_dict[name] = v
133 | net.load_state_dict(new_state_dict)
134 | print('Loading resume network...')
135 |
136 | if args.ngpu > 1:
137 | net = torch.nn.DataParallel(net)
138 |
139 | if args.cuda:
140 | net.cuda()
141 | cudnn.benchmark = True
142 |
143 | optimizer = optim.SGD(net.parameters(), lr=args.lr,
144 | momentum=momentum, weight_decay=weight_decay)
145 |
146 | # load yolo loss
147 | if merge_yolo_loss:
148 | criterion = MultiYoloLoss(input_wh, cfg["num_classes"], ignore_thresh, cfg["anchors"], cfg["anchors_mask"])
149 | else:
150 | criterion = YoloLoss(input_wh, cfg["num_classes"], ignore_thresh, cfg["anchors"], cfg["anchors_mask"])
151 | net.train()
152 | ave_loss = -1
153 | epoch = 0 + args.resume_epoch
154 | mini_batch_size = int(batch_size / subdivisions)
155 |
156 | epoch_size = len(train_dataset) // (batch_size)
157 | max_iter = args.max_epoch * epoch_size
158 |
159 | stepvalues_VOC = (160 * epoch_size, 180 * epoch_size, 201 * epoch_size)
160 | stepvalues_COCO = (90 * epoch_size, 120 * epoch_size, 140 * epoch_size)
161 | stepvalues = (stepvalues_VOC, stepvalues_COCO)[args.dataset=='COCO']
162 |
163 | print('Training', args.version, 'on', train_dataset.name)
164 | step_index = 0
165 |
166 | if args.resume_epoch > 0:
167 | start_iter = args.resume_epoch * epoch_size
168 | else:
169 | start_iter = 0
170 |
171 | lr = args.lr
172 |
173 | # begin to train
174 | for iteration in range(start_iter, max_iter):
175 | if iteration % epoch_size == 0:
176 | batch_iterator = iter(data.DataLoader(train_dataset,
177 | mini_batch_size,
178 | shuffle=False,
179 | num_workers=args.num_workers,
180 | collate_fn=detection_collate))
181 | if (epoch % 5 == 0 and epoch > 0) or (epoch % 5 == 0 and epoch > 200):
182 | torch.save(net.state_dict(), args.save_folder+args.version+'_'+args.dataset + '_epoches_'+
183 | repr(epoch) + '.pth')
184 | epoch += 1
185 |
186 | load_t0 = time.time()
187 | if iteration in stepvalues:
188 | step_index += 1
189 | lr = adjust_learning_rate(optimizer, gamma, epoch, step_index, iteration, epoch_size)
190 | debug = False
191 | if iteration % 10 == 0:
192 | debug = True
193 | optimizer.zero_grad()
194 | loss_sum = 0
195 | for i in range(subdivisions):
196 | images, targets = next(batch_iterator)
197 | images.requires_grad_()
198 | if args.cuda:
199 | images = images.cuda()
200 | with torch.no_grad():
201 | targets = [anno.cuda() for anno in targets]
202 | else:
203 | images = images
204 | with torch.no_grad():
205 | targets = targets
206 | # forward
207 | resize_wh = images.size(3), images.size(2)
208 | out = net(images)
209 | loss = criterion(out, targets, resize_wh, debug) / subdivisions
210 | loss.backward()
211 | loss_sum += loss.item()
212 |
213 | if ave_loss < 0:
214 | ave_loss = loss_sum
215 | ave_loss = 0.1 * loss_sum + 0.9 * ave_loss
216 | optimizer.step()
217 | load_t1 = time.time()
218 | if iteration % 10 == 0:
219 | print('Epoch:' + repr(epoch) + ' || epochiter: ' + repr(iteration % epoch_size) + '/' + repr(epoch_size)
220 | + '|| Totel iter ' +
221 | repr(iteration) + ' Cur : %.4f Ave : %.4f' % (loss_sum, ave_loss) +
222 | ' iteration time: %.4f sec. ||' % (load_t1 - load_t0) + 'LR: %.5f' % (lr))
223 |
224 | torch.save(net.state_dict(), args.save_folder+args.version+'_'+args.dataset + "_final"+ '.pth')
225 |
226 |
227 |
228 |
229 |
230 |
--------------------------------------------------------------------------------
/utils/box_utils.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 |
3 | import torch
4 | import torch.nn as nn
5 | import torch.nn.functional as F
6 | from torch.autograd import Variable
7 | import numpy as np
8 | import math
9 | import cv2
10 | import time
11 | from utils.nms_wrapper import nms
12 |
13 |
14 | def get_rects(detection, input_wh, ori_wh, use_pad=False):
15 | if len(detection) > 0:
16 | if use_pad:
17 | scaling_factor = min(input_wh[0] / ori_wh[0], input_wh[1] / ori_wh[1])
18 | detection[:,[1,3]] -= (input_wh[0] - scaling_factor * ori_wh[0]) / 2
19 | detection[:,[2,4]] -= (input_wh[1] - scaling_factor * ori_wh[1]) / 2
20 | detection[:,1:5] /= scaling_factor
21 | else:
22 | detection[:,[1,3]] /= input_wh[0]
23 | detection[:,[2,4]] /= input_wh[1]
24 | detection[:, [1,3]] *= ori_wh[0]
25 | detection[:, [2,4]] *= ori_wh[1]
26 | for i in range(detection.shape[0]):
27 | detection[i, [1,3]] = torch.clamp(detection[i, [1,3]], 0.0, ori_wh[0])
28 | detection[i, [2,4]] = torch.clamp(detection[i, [2,4]], 0.0, ori_wh[1])
29 | return detection
30 |
31 | def draw_rects(img, rects, classes):
32 | print(rects)
33 | for rect in rects:
34 | if rect[5] > 0.1:
35 | left_top = (int(rect[0]), int(rect[1]))
36 | right_bottom = (int(rect[2]), int(rect[3]))
37 | score = round(rect[4], 3)
38 | cls_id = int(rect[-1])
39 | label = "{0}".format(classes[cls_id])
40 | class_len = len(classes)
41 | offset = cls_id * 123457 % class_len
42 | red = get_color(2, offset, class_len)
43 | green = get_color(1, offset, class_len)
44 | blue = get_color(0, offset, class_len)
45 | color = (blue, green, red)
46 | cv2.rectangle(img, left_top, right_bottom, color, 2)
47 | t_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_PLAIN, 1 , 1)[0]
48 | right_bottom = left_top[0] + t_size[0] + 3, left_top[1] - t_size[1] - 4
49 | cv2.rectangle(img, left_top, right_bottom, color, -1)
50 | cv2.putText(img, str(label)+str(score), (left_top[0], left_top[1] - t_size[1] - 4), cv2.FONT_HERSHEY_PLAIN, 1, [225,255,255], 1)
51 | return img
52 |
53 | def get_color(c, x, max_val):
54 | colors = torch.FloatTensor([[1,0,1],[0,0,1],[0,1,1],[0,1,0],[1,1,0],[1,0,0]])
55 | ratio = float(x) / max_val * 5
56 | i = int(math.floor(ratio))
57 | j = int(math.ceil(ratio))
58 | ratio = ratio - i
59 | r = (1-ratio) * colors[i][c] + ratio * colors[j][c]
60 | return int(r*255)
61 |
62 |
63 | def unique(tensor):
64 | tensor_np = tensor.cpu().numpy()
65 | unique_np = np.unique(tensor_np)
66 | unique_tensor = torch.from_numpy(unique_np)
67 |
68 | tensor_res = tensor.new(unique_tensor.shape)
69 | tensor_res.copy_(unique_tensor)
70 | return tensor_res
71 |
72 | def point_form(boxes):
73 | """ Convert prior_boxes to (xmin, ymin, xmax, ymax)
74 | representation for comparison to point form ground truth data.
75 | Args:
76 | boxes: (tensor) center-size default boxes from priorbox layers.
77 | Return:
78 | boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes.
79 | """
80 | return torch.cat((boxes[:, :2] - boxes[:, 2:]/2, # xmin, ymin
81 | boxes[:, :2] + boxes[:, 2:]/2), 1) # xmax, ymax
82 |
83 | def center_size(boxes):
84 | """ Convert prior_boxes to (cx, cy, w, h)
85 | representation for comparison to center-size form ground truth data.
86 | Args:
87 | boxes: (tensor) point_form boxes
88 | Return:
89 | boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes.
90 | """
91 | return torch.cat([(boxes[:, 2:] + boxes[:, :2])/2, boxes[:, 2:] - boxes[:, :2]], 1) # w, h
92 |
93 |
94 | def intersect(box_a, box_b):
95 | """ We resize both tensors to [A,B,2] without new malloc:
96 | [A,2] -> [A,1,2] -> [A,B,2]
97 | [B,2] -> [1,B,2] -> [A,B,2]
98 | Then we compute the area of intersect between box_a and box_b.
99 | Args:
100 | box_a: (tensor) bounding boxes, Shape: [A,4].
101 | box_b: (tensor) bounding boxes, Shape: [B,4].
102 | Return:
103 | (tensor) intersection area, Shape: [A,B].
104 | """
105 | # print(box_a)
106 | A = box_a.size(0)
107 | B = box_b.size(0)
108 | max_xy = torch.min(box_a[:, 2:].unsqueeze(1).expand(A, B, 2),
109 | box_b[:, 2:].unsqueeze(0).expand(A, B, 2))
110 | min_xy = torch.max(box_a[:, :2].unsqueeze(1).expand(A, B, 2),
111 | box_b[:, :2].unsqueeze(0).expand(A, B, 2))
112 | inter = torch.clamp((max_xy - min_xy), min=0)
113 | return inter[:, :, 0] * inter[:, :, 1]
114 |
115 |
116 | def jaccard(box_a, box_b):
117 | """Compute the jaccard overlap of two sets of boxes. The jaccard overlap
118 | is simply the intersection over union of two boxes. Here we operate on
119 | ground truth boxes and default boxes.
120 | E.g.:
121 | A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B)
122 | Args:
123 | box_a: (tensor) Ground truth bounding boxes, Shape: [num_objects,4]
124 | box_b: (tensor) Prior boxes from priorbox layers, Shape: [num_priors,4]
125 | Return:
126 | jaccard overlap: (tensor) Shape: [box_a.size(0), box_b.size(0)]
127 | """
128 | inter = intersect(box_a, box_b)
129 | area_a = ((box_a[:, 2]-box_a[:, 0]) *
130 | (box_a[:, 3]-box_a[:, 1])).unsqueeze(1).expand_as(inter) # [A,B]
131 | area_b = ((box_b[:, 2]-box_b[:, 0]) *
132 | (box_b[:, 3]-box_b[:, 1])).unsqueeze(0).expand_as(inter) # [A,B]
133 | union = area_a + area_b - inter
134 | return inter / union # [A,B]
135 |
136 | def trans_anchors(anchors):
137 | new_anchors = torch.zeros((anchors.size(0), 4))
138 | new_anchors[:, :2] += 2000
139 | new_anchors[:, 2:] = anchors[:,]
140 | return point_form(new_anchors)
141 |
142 | def trans_truths(truths):
143 | new_truths = torch.zeros((truths.size(0), 4))
144 | new_truths[:, :2] += 2000
145 | new_truths[:, 2:] = truths[:, 2:4]
146 | return point_form(new_truths)
147 |
148 | def int_index(anchors_mask, val):
149 | for i in range(len(anchors_mask)):
150 | if val == anchors_mask[i]:
151 | return i
152 | return -1
153 |
154 | def encode_targets_all(input_wh, truths, labels, best_anchor_idx, anchors, feature_dim, num_pred, back_mask):
155 | scale = torch.ones(num_pred).cuda()
156 | encode_truths = torch.zeros((num_pred, 6)).cuda()
157 | fore_mask = torch.zeros(num_pred).cuda()
158 | # l_dim, m_dim, h_dim = feature_dim
159 | l_grid_wh, m_grid_wh, h_grid_wh = feature_dim
160 | for i in range(best_anchor_idx.size(0)):
161 | index = 0
162 | grid_wh = (0, 0)
163 | # mask [0, 1, 2]
164 | if best_anchor_idx[i].item() < 2.1:
165 | grid_wh = l_grid_wh
166 | index_begin = 0
167 | # mask [3, 4, 5]
168 | elif best_anchor_idx[i].item() < 5.1:
169 | grid_wh = m_grid_wh
170 | index_begin = l_grid_wh[0] * l_grid_wh[1] * 3
171 | # mask [6, 7, 8]
172 | else:
173 | grid_wh = h_grid_wh
174 | index_begin = (l_grid_wh[0]*l_grid_wh[1] + m_grid_wh[0]*m_grid_wh[1]) * 3
175 | x = (truths[i][0] / input_wh[0]) * grid_wh[0]
176 | y = (truths[i][1] / input_wh[1]) * grid_wh[1]
177 | floor_x, floor_y = math.floor(x), math.floor(y)
178 | anchor_idx = best_anchor_idx[i].int().item() % 3
179 | index = index_begin + floor_y * grid_wh[0] * 3 + floor_x * 3 + anchor_idx
180 |
181 | scale[index] = scale[index] + 1. - (truths[i][2] / input_wh[0]) * (truths[i][3] / input_wh[1])
182 |
183 | # encode targets x, y, w, h, objectness, class
184 | truths[i][0] = x - floor_x
185 | truths[i][1] = y - floor_y
186 | truths[i][2] = torch.log(truths[i][2] / anchors[best_anchor_idx[i]][0] + 1e-8)
187 | truths[i][3] = torch.log(truths[i][3] / anchors[best_anchor_idx[i]][1] + 1e-8)
188 | encode_truths[index, :4] = truths[i]
189 | encode_truths[index, 4] = 1.
190 | encode_truths[index, 5] = labels[i].int().item()
191 |
192 | # set foreground mask to 1 and background mask to 0, because pred should have unique target
193 | fore_mask[index] = 1.
194 | back_mask[index] = 0
195 |
196 | return encode_truths, fore_mask > 0, scale, back_mask
197 |
198 | def encode_targets_single(input_wh, truths, labels, best_anchor_idx, anchors, anchors_mask, back_mask, grid_wh):
199 | grid_w, grid_h = grid_wh[0], grid_wh[1]
200 | num_pred = grid_w * grid_h * len(anchors_mask)
201 | scale = torch.ones(num_pred).cuda()
202 | encode_truths = torch.zeros((num_pred, 6)).cuda()
203 | fore_mask = torch.zeros(num_pred).cuda()
204 |
205 | for i in range(best_anchor_idx.size(0)):
206 | mask_n = int_index(anchors_mask, best_anchor_idx[i])
207 | if mask_n < 0:
208 | continue
209 | x = (truths[i][0] / input_wh[0]) * grid_wh[0]
210 | y = (truths[i][1] / input_wh[1]) * grid_wh[1]
211 | floor_x, floor_y = math.floor(x), math.floor(y)
212 | index = floor_y * grid_wh[0] * 3 + floor_x * 3 + mask_n
213 | scale[index] = scale[index] + 1. - (truths[i][2] / input_wh[0]) * (truths[i][3] / input_wh[1])
214 | truths[i][0] = x - floor_x
215 | truths[i][1] = y - floor_y
216 | truths[i][2] = torch.log(truths[i][2] / anchors[best_anchor_idx[i]][0] + 1e-8)
217 | truths[i][3] = torch.log(truths[i][3] / anchors[best_anchor_idx[i]][1] + 1e-8)
218 | encode_truths[index, :4] = truths[i]
219 | encode_truths[index, 4] = 1.
220 | encode_truths[index, 5] = labels[i].int().item()
221 | fore_mask[index] = 1.
222 | back_mask[index] = 0
223 |
224 | return encode_truths, fore_mask > 0, scale, back_mask
225 |
226 | def targets_match_single(input_wh, threshold, targets, pred, anchors, anchors_mask, pred_t, scale_t, fore_mask_t, back_mask_t, grid_wh, idx, cuda=True):
227 | loc_truths = targets[:, :4].data
228 | labels = targets[:,-1].data
229 | overlaps = jaccard(
230 | loc_truths,
231 | point_form(pred))
232 | # (Bipartite Matching)
233 | # [1,num_objects] best prior for each ground truth
234 | # best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True)
235 | # [1,num_priors] best ground truth for each prior
236 |
237 | best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True)
238 | best_truth_idx.squeeze_(0)
239 | best_truth_overlap.squeeze_(0)
240 | back_mask = (best_truth_overlap - threshold) < 0
241 |
242 | anchors = torch.FloatTensor(anchors)
243 | if cuda:
244 | anchors = anchors.cuda()
245 |
246 | center_truths = center_size(loc_truths)
247 |
248 | # convert anchor and truths to calculate iou
249 | new_anchors = trans_anchors(anchors)
250 | new_truths = trans_truths(center_truths)
251 | overlaps_ = jaccard(
252 | new_truths,
253 | new_anchors)
254 | best_anchor_overlap, best_anchor_idx = overlaps_.max(1, keepdim=True)
255 | best_anchor_idx.squeeze_(1)
256 | best_anchor_overlap.squeeze_(1)
257 |
258 | encode_truths, fore_mask, scale, back_mask = encode_targets_single(input_wh, center_truths, labels, best_anchor_idx, anchors, anchors_mask, back_mask, grid_wh)
259 |
260 | pred_t[idx] = encode_truths
261 | scale_t[idx] = scale
262 | fore_mask_t[idx] = fore_mask
263 | back_mask_t[idx] = back_mask
264 |
265 | def targets_match_all(input_wh, threshold, targets, pred, anchors, feature_dim, pred_t, scale_t, fore_mask_t, back_mask_t, num_pred, idx, cuda=True):
266 | loc_truths = targets[:, :4].data
267 | labels = targets[:,-1].data
268 | overlaps = jaccard(
269 | loc_truths,
270 | point_form(pred))
271 | # (Bipartite Matching)
272 | # [1,num_objects] best prior for each ground truth
273 | # best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True)
274 | # [1,num_priors] best ground truth for each prior
275 |
276 | best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True)
277 | best_truth_idx.squeeze_(0)
278 | best_truth_overlap.squeeze_(0)
279 | back_mask = (best_truth_overlap - threshold) < 0
280 |
281 | anchors = torch.FloatTensor(anchors)
282 | if cuda:
283 | anchors = anchors.cuda()
284 |
285 | center_truths = center_size(loc_truths)
286 | new_anchors = trans_anchors(anchors)
287 | new_truths = trans_truths(center_truths)
288 | overlaps_ = jaccard(
289 | new_truths,
290 | new_anchors)
291 | best_anchor_overlap, best_anchor_idx = overlaps_.max(1, keepdim=True)
292 | best_anchor_idx.squeeze_(1)
293 | best_anchor_overlap.squeeze_(1)
294 |
295 | encode_truths, fore_mask, scale, back_mask = encode_targets_all(input_wh, center_truths, labels, best_anchor_idx, anchors, feature_dim, num_pred, back_mask)
296 |
297 | pred_t[idx] = encode_truths
298 | scale_t[idx] = scale
299 | fore_mask_t[idx] = fore_mask
300 | back_mask_t[idx] = back_mask
301 |
302 | def decode(prediction, input_wh, anchors, num_classes, stride_wh, cuda=True):
303 | grid_wh = (input_wh[0] // stride_wh[0], input_wh[1] // stride_wh[1])
304 | grid_w = np.arange(grid_wh[0])
305 | grid_h = np.arange(grid_wh[1])
306 | a,b = np.meshgrid(grid_w, grid_h)
307 |
308 | num_anchors = len(anchors)
309 | x_offset = torch.FloatTensor(a).view(-1,1)
310 | y_offset = torch.FloatTensor(b).view(-1,1)
311 | anchors = [(a[0]/stride_wh[0], a[1]/stride_wh[1]) for a in anchors]
312 | if cuda:
313 | x_offset = x_offset.cuda()
314 | y_offset = y_offset.cuda()
315 | x_y_offset = torch.cat((x_offset, y_offset), 1).repeat(1, num_anchors).view(-1,2).unsqueeze(0)
316 | prediction[:,:,:2] += x_y_offset
317 | anchors = torch.FloatTensor(anchors)
318 | if cuda:
319 | anchors = anchors.cuda()
320 | anchors = anchors.repeat(grid_wh[0]*grid_wh[1], 1).unsqueeze(0)
321 | prediction[:,:,2:4] = torch.exp(prediction[:,:,2:4]) * anchors
322 | prediction[:,:,0] *= stride_wh[0]
323 | prediction[:,:,2] *= stride_wh[0]
324 | prediction[:,:,1] *= stride_wh[1]
325 | prediction[:,:,3] *= stride_wh[1]
326 | return prediction
327 |
328 | def permute_sigmoid(x, input_wh, num_anchors, num_classes):
329 | batch_size = x.size(0)
330 | grid_wh = (x.size(3), x.size(2))
331 | input_w, input_h = input_wh
332 | stride_wh = (input_w // grid_wh[0], input_h // grid_wh[1])
333 | bbox_attrs = 5 + num_classes
334 | x = x.view(batch_size, bbox_attrs*num_anchors, grid_wh[0] * grid_wh[1])
335 | x = x.transpose(1,2).contiguous()
336 | x = x.view(batch_size, grid_wh[0]*grid_wh[1]*num_anchors, bbox_attrs)
337 | x[:,:,0] = torch.sigmoid(x[:,:,0])
338 | x[:,:,1] = torch.sigmoid(x[:,:,1])
339 | x[:,:, 4 : bbox_attrs] = torch.sigmoid((x[:,:, 4 : bbox_attrs]))
340 | return x, stride_wh
341 |
342 | def detection_postprecess(detection, iou_thresh, num_classes, input_wh, ori_wh, use_pad=False, nms_conf=0.4):
343 | assert detection.size(0) == 1, "only support batch_size == 1"
344 | conf_mask = (detection[:,:,4] > iou_thresh).float().unsqueeze(2)
345 | detection = detection * conf_mask
346 | try:
347 | ind_nz = torch.nonzero(detection[:,:,4]).transpose(0,1).contiguous()
348 | except:
349 | print("detect no results")
350 | return np.empty([0, 5], dtype=np.float32)
351 | bbox_pred = point_form(detection[:, :, :4].view(-1, 4))
352 | conf_pred = detection[:, :, 4].view(-1, 1)
353 | cls_pred = detection[:, :, 5:].view(-1, num_classes)
354 |
355 | max_conf, max_conf_idx = torch.max(cls_pred, 1)
356 |
357 | max_conf = max_conf.float().unsqueeze(1)
358 | max_conf_idx = max_conf_idx.float().unsqueeze(1)
359 |
360 | # score = (conf_pred * max_conf).view(-1, 1)
361 | score = conf_pred
362 | image_pred = torch.cat((bbox_pred, score, max_conf, max_conf_idx), 1)
363 |
364 | non_zero_ind = (torch.nonzero(image_pred[:,4]))
365 | image_pred_ = image_pred[non_zero_ind.squeeze(),:].view(-1, 7)
366 | try:
367 | img_classes = unique(image_pred_[:,-1])
368 | except:
369 | print("no class find")
370 | return np.empty([0, 7], dtype=np.float32)
371 | flag = False
372 | out_out = None
373 | for cls in img_classes:
374 | cls_mask = image_pred_*(image_pred_[:,-1] == cls).float().unsqueeze(1)
375 | class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze()
376 |
377 | image_pred_class = image_pred_[class_mask_ind].view(-1,7)
378 | keep = nms(image_pred_class.cpu().numpy(), nms_conf, force_cpu=True)
379 | image_pred_class = image_pred_class[keep]
380 | if not flag:
381 | out_put = image_pred_class
382 | flag = True
383 | else:
384 | out_put = torch.cat((out_put, image_pred_class), 0)
385 |
386 |
387 | image_pred_class = out_put
388 | if use_pad:
389 | scaling_factor = min(input_wh[0] / ori_wh[0], input_wh[1] / ori_wh[1])
390 | image_pred_class[:,[0,2]] -= (input_wh[0] - scaling_factor * ori_wh[0]) / 2
391 | image_pred_class[:,[1,3]] -= (input_wh[1] - scaling_factor * ori_wh[1]) / 2
392 | image_pred_class[:,:4] /= scaling_factor
393 | else:
394 | image_pred_class[:,[0,2]] /= input_wh[0]
395 | image_pred_class[:,[1,3]] /= input_wh[1]
396 | image_pred_class[:, [0,2]] *= ori_wh[0]
397 | image_pred_class[:, [1,3]] *= ori_wh[1]
398 |
399 | for i in range(image_pred_class.shape[0]):
400 | image_pred_class[i, [0,2]] = torch.clamp(image_pred_class[i, [0,2]], 0.0, ori_wh[0])
401 | image_pred_class[i, [1,3]] = torch.clamp(image_pred_class[i, [1,3]], 0.0, ori_wh[1])
402 | return image_pred_class.cpu().numpy()
403 |
404 |
--------------------------------------------------------------------------------
/utils/build.py:
--------------------------------------------------------------------------------
1 | # --------------------------------------------------------
2 | # Fast R-CNN
3 | # Copyright (c) 2015 Microsoft
4 | # Licensed under The MIT License [see LICENSE for details]
5 | # Written by Ross Girshick
6 | # --------------------------------------------------------
7 |
8 | import os
9 | from os.path import join as pjoin
10 | import numpy as np
11 | from distutils.core import setup
12 | from distutils.extension import Extension
13 | from Cython.Distutils import build_ext
14 |
15 |
16 | def find_in_path(name, path):
17 | "Find a file in a search path"
18 | # adapted fom http://code.activestate.com/recipes/52224-find-a-file-given-a-search-path/
19 | for dir in path.split(os.pathsep):
20 | binpath = pjoin(dir, name)
21 | if os.path.exists(binpath):
22 | return os.path.abspath(binpath)
23 | return None
24 |
25 |
26 | def locate_cuda():
27 | """Locate the CUDA environment on the system
28 |
29 | Returns a dict with keys 'home', 'nvcc', 'include', and 'lib64'
30 | and values giving the absolute path to each directory.
31 |
32 | Starts by looking for the CUDAHOME env variable. If not found, everything
33 | is based on finding 'nvcc' in the PATH.
34 | """
35 |
36 | # first check if the CUDAHOME env variable is in use
37 | if 'CUDAHOME' in os.environ:
38 | home = os.environ['CUDAHOME']
39 | nvcc = pjoin(home, 'bin', 'nvcc')
40 | else:
41 | # otherwise, search the PATH for NVCC
42 | default_path = pjoin(os.sep, 'usr', 'local', 'cuda', 'bin')
43 | nvcc = find_in_path('nvcc', os.environ['PATH'] + os.pathsep + default_path)
44 | if nvcc is None:
45 | raise EnvironmentError('The nvcc binary could not be '
46 | 'located in your $PATH. Either add it to your path, or set $CUDAHOME')
47 | home = os.path.dirname(os.path.dirname(nvcc))
48 |
49 | cudaconfig = {'home': home, 'nvcc': nvcc,
50 | 'include': pjoin(home, 'include'),
51 | 'lib64': pjoin(home, 'lib64')}
52 | for k, v in cudaconfig.items():
53 | if not os.path.exists(v):
54 | raise EnvironmentError('The CUDA %s path could not be located in %s' % (k, v))
55 |
56 | return cudaconfig
57 |
58 |
59 | CUDA = locate_cuda()
60 |
61 | # Obtain the numpy include directory. This logic works across numpy versions.
62 | try:
63 | numpy_include = np.get_include()
64 | except AttributeError:
65 | numpy_include = np.get_numpy_include()
66 |
67 |
68 | def customize_compiler_for_nvcc(self):
69 | """inject deep into distutils to customize how the dispatch
70 | to gcc/nvcc works.
71 |
72 | If you subclass UnixCCompiler, it's not trivial to get your subclass
73 | injected in, and still have the right customizations (i.e.
74 | distutils.sysconfig.customize_compiler) run on it. So instead of going
75 | the OO route, I have this. Note, it's kindof like a wierd functional
76 | subclassing going on."""
77 |
78 | # tell the compiler it can processes .cu
79 | self.src_extensions.append('.cu')
80 |
81 | # save references to the default compiler_so and _comple methods
82 | default_compiler_so = self.compiler_so
83 | super = self._compile
84 |
85 | # now redefine the _compile method. This gets executed for each
86 | # object but distutils doesn't have the ability to change compilers
87 | # based on source extension: we add it.
88 | def _compile(obj, src, ext, cc_args, extra_postargs, pp_opts):
89 | print(extra_postargs)
90 | if os.path.splitext(src)[1] == '.cu':
91 | # use the cuda for .cu files
92 | self.set_executable('compiler_so', CUDA['nvcc'])
93 | # use only a subset of the extra_postargs, which are 1-1 translated
94 | # from the extra_compile_args in the Extension class
95 | postargs = extra_postargs['nvcc']
96 | else:
97 | postargs = extra_postargs['gcc']
98 |
99 | super(obj, src, ext, cc_args, postargs, pp_opts)
100 | # reset the default compiler_so, which we might have changed for cuda
101 | self.compiler_so = default_compiler_so
102 |
103 | # inject our redefined _compile method into the class
104 | self._compile = _compile
105 |
106 |
107 | # run the customize_compiler
108 | class custom_build_ext(build_ext):
109 | def build_extensions(self):
110 | customize_compiler_for_nvcc(self.compiler)
111 | build_ext.build_extensions(self)
112 |
113 |
114 | ext_modules = [
115 | Extension(
116 | "nms.cpu_nms",
117 | ["nms/cpu_nms.pyx"],
118 | extra_compile_args={'gcc': ["-Wno-cpp", "-Wno-unused-function"]},
119 | include_dirs=[numpy_include]
120 | ),
121 | Extension('nms.gpu_nms',
122 | ['nms/nms_kernel.cu', 'nms/gpu_nms.pyx'],
123 | library_dirs=[CUDA['lib64']],
124 | libraries=['cudart'],
125 | language='c++',
126 | runtime_library_dirs=[CUDA['lib64']],
127 | # this syntax is specific to this build system
128 | # we're only going to use certain compiler args with nvcc and not with gcc
129 | # the implementation of this trick is in customize_compiler() below
130 | extra_compile_args={'gcc': ["-Wno-unused-function"],
131 | 'nvcc': ['-arch=sm_61',
132 | '--ptxas-options=-v',
133 | '-c',
134 | '--compiler-options',
135 | "'-fPIC'"]},
136 | include_dirs=[numpy_include, CUDA['include']]
137 | ),
138 | # Extension(
139 | # 'pycocotools._mask',
140 | # sources=['pycocotools/maskApi.c', 'pycocotools/_mask.pyx'],
141 | # include_dirs=[numpy_include, 'pycocotools'],
142 | # extra_compile_args={
143 | # 'gcc': ['-Wno-cpp', '-Wno-unused-function', '-std=c99']},
144 | # ),
145 | ]
146 |
147 | setup(
148 | name='mot_utils',
149 | ext_modules=ext_modules,
150 | # inject our custom trigger
151 | cmdclass={'build_ext': custom_build_ext},
152 | )
153 |
--------------------------------------------------------------------------------
/utils/gen_anchors.py:
--------------------------------------------------------------------------------
1 | import random
2 | import argparse
3 | import numpy as np
4 | import os
5 | import sys
6 | if sys.version_info[0] == 2:
7 | import xml.etree.cElementTree as ET
8 | else:
9 | import xml.etree.ElementTree as ET
10 | import pickle
11 |
12 | import json
13 |
14 | def parse_voc_annotation(ann_dir, img_dir, train_val_list, cache_name, labels=[]):
15 | if os.path.exists(cache_name):
16 | with open(cache_name, 'rb') as handle:
17 | cache = pickle.load(handle)
18 | all_insts, seen_labels = cache['all_insts'], cache['seen_labels']
19 | else:
20 | all_insts = []
21 | seen_labels = {}
22 |
23 | for ann in sorted(train_val_list):
24 | img = {'object':[]}
25 |
26 | try:
27 | tree = ET.parse(os.path.join(ann_dir, ann + ".xml"))
28 | except Exception as e:
29 | print(e)
30 | print('Ignore this bad annotation: ' + ann_dir + ann)
31 | continue
32 |
33 | for elem in tree.iter():
34 | if 'filename' in elem.tag:
35 | img['filename'] = os.path.join(img_dir, elem.text + ".jpg")
36 | if 'width' in elem.tag:
37 | img['width'] = int(elem.text)
38 | if 'height' in elem.tag:
39 | img['height'] = int(elem.text)
40 | if 'object' in elem.tag or 'part' in elem.tag:
41 | obj = {}
42 |
43 | for attr in list(elem):
44 | if 'name' in attr.tag:
45 | obj['name'] = attr.text
46 |
47 | if obj['name'] in seen_labels:
48 | seen_labels[obj['name']] += 1
49 | else:
50 | seen_labels[obj['name']] = 1
51 |
52 | if len(labels) > 0 and obj['name'] not in labels:
53 | break
54 | else:
55 | img['object'] += [obj]
56 |
57 | if 'bndbox' in attr.tag:
58 | for dim in list(attr):
59 | if 'xmin' in dim.tag:
60 | obj['xmin'] = int(round(float(dim.text)))
61 | if 'ymin' in dim.tag:
62 | obj['ymin'] = int(round(float(dim.text)))
63 | if 'xmax' in dim.tag:
64 | obj['xmax'] = int(round(float(dim.text)))
65 | if 'ymax' in dim.tag:
66 | obj['ymax'] = int(round(float(dim.text)))
67 |
68 | if len(img['object']) > 0:
69 | all_insts += [img]
70 |
71 | cache = {'all_insts': all_insts, 'seen_labels': seen_labels}
72 | with open(cache_name, 'wb') as handle:
73 | pickle.dump(cache, handle, protocol=pickle.HIGHEST_PROTOCOL)
74 |
75 | return all_insts, seen_labels
76 |
77 | def IOU(ann, centroids):
78 | w, h = ann
79 | similarities = []
80 |
81 | for centroid in centroids:
82 | c_w, c_h = centroid
83 |
84 | if c_w >= w and c_h >= h:
85 | similarity = w*h/(c_w*c_h)
86 | elif c_w >= w and c_h <= h:
87 | similarity = w*c_h/(w*h + (c_w-w)*c_h)
88 | elif c_w <= w and c_h >= h:
89 | similarity = c_w*h/(w*h + c_w*(c_h-h))
90 | else: #means both w,h are bigger than c_w and c_h respectively
91 | similarity = (c_w*c_h)/(w*h)
92 | similarities.append(similarity) # will become (k,) shape
93 |
94 | return np.array(similarities)
95 |
96 | def avg_IOU(anns, centroids):
97 | n,d = anns.shape
98 | sum = 0.
99 |
100 | for i in range(anns.shape[0]):
101 | sum+= max(IOU(anns[i], centroids))
102 |
103 | return sum/n
104 |
105 | def print_anchors(centroids):
106 | out_string = ''
107 |
108 | anchors = centroids.copy()
109 |
110 | widths = anchors[:, 0]
111 | sorted_indices = np.argsort(widths)
112 |
113 | r = "anchors: ["
114 | for i in sorted_indices:
115 | out_string += str(int(anchors[i,0]*416)) + ',' + str(int(anchors[i,1]*416)) + ', '
116 |
117 | print(out_string[:-2])
118 |
119 | def run_kmeans(ann_dims, anchor_num):
120 | ann_num = ann_dims.shape[0]
121 | iterations = 0
122 | prev_assignments = np.ones(ann_num)*(-1)
123 | iteration = 0
124 | old_distances = np.zeros((ann_num, anchor_num))
125 |
126 | indices = [random.randrange(ann_dims.shape[0]) for i in range(anchor_num)]
127 | centroids = ann_dims[indices]
128 | anchor_dim = ann_dims.shape[1]
129 |
130 | while True:
131 | distances = []
132 | iteration += 1
133 | for i in range(ann_num):
134 | d = 1 - IOU(ann_dims[i], centroids)
135 | distances.append(d)
136 | distances = np.array(distances) # distances.shape = (ann_num, anchor_num)
137 |
138 | print("iteration {}: dists = {}".format(iteration, np.sum(np.abs(old_distances-distances))))
139 |
140 | #assign samples to centroids
141 | assignments = np.argmin(distances,axis=1)
142 |
143 | if (assignments == prev_assignments).all() :
144 | return centroids
145 |
146 | #calculate new centroids
147 | centroid_sums=np.zeros((anchor_num, anchor_dim), np.float)
148 | for i in range(ann_num):
149 | centroid_sums[assignments[i]]+=ann_dims[i]
150 | for j in range(anchor_num):
151 | centroids[j] = centroid_sums[j]/(np.sum(assignments==j) + 1e-6)
152 |
153 | prev_assignments = assignments.copy()
154 | old_distances = distances.copy()
155 |
156 | def _main_(argv):
157 | num_anchors = args.anchors
158 | train_annot_folder = "/localSSD/yyq/VOCdevkit0712/VOC0712/Annotations/"
159 | train_image_folder = "/localSSD/yyq/VOCdevkit0712/VOC0712/JPEGImages/"
160 | train_val_txt = "/localSSD/yyq/VOCdevkit0712/VOC0712/ImageSets/Main/0712_trainval_test.txt"
161 | with open(train_val_txt, "r") as f:
162 | train_val_list = [i.strip() for i in f.readlines()]
163 | cache_name = "voc_train.pkl"
164 | labels = ["aeroplane", "bicycle", "bird", "boat",
165 | "bottle", "bus", "car", "cat", "chair",
166 | "cow", "diningtable", "dog", "horse",
167 | "motorbike", "person", "pottedplant",
168 | "sheep", "sofa", "train", "tvmonitor"]
169 |
170 | train_imgs, train_labels = parse_voc_annotation(
171 | train_annot_folder,
172 | train_image_folder,
173 | train_val_list,
174 | cache_name,
175 | labels
176 | )
177 |
178 | # run k_mean to find the anchors
179 | annotation_dims = []
180 | for image in train_imgs:
181 | # print(image['filename'])
182 | for obj in image['object']:
183 | relative_w = (float(obj['xmax']) - float(obj['xmin']))/image['width']
184 | relatice_h = (float(obj["ymax"]) - float(obj['ymin']))/image['height']
185 | annotation_dims.append(tuple(map(float, (relative_w,relatice_h))))
186 |
187 | annotation_dims = np.array(annotation_dims)
188 | centroids = run_kmeans(annotation_dims, num_anchors)
189 |
190 | # write anchors to file
191 | print('\naverage IOU for', num_anchors, 'anchors:', '%0.2f' % avg_IOU(annotation_dims, centroids))
192 | print_anchors(centroids)
193 |
194 | if __name__ == '__main__':
195 | argparser = argparse.ArgumentParser()
196 |
197 | argparser.add_argument(
198 | '-c',
199 | '--conf',
200 | default='config.json',
201 | help='path to configuration file')
202 | argparser.add_argument(
203 | '-a',
204 | '--anchors',
205 | default=9,
206 | help='number of anchors to use')
207 |
208 | args = argparser.parse_args()
209 | _main_(args)
--------------------------------------------------------------------------------
/utils/nms/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yqyao/YOLOv3_Pytorch/ea392f7d418be94605f86ba2b5d167ec30611def/utils/nms/__init__.py
--------------------------------------------------------------------------------
/utils/nms/cpu_nms.pyx:
--------------------------------------------------------------------------------
1 | # --------------------------------------------------------
2 | # Fast R-CNN
3 | # Copyright (c) 2015 Microsoft
4 | # Licensed under The MIT License [see LICENSE for details]
5 | # Written by Ross Girshick
6 | # --------------------------------------------------------
7 |
8 | import numpy as np
9 | cimport numpy as np
10 |
11 | cdef inline np.float32_t max(np.float32_t a, np.float32_t b):
12 | return a if a >= b else b
13 |
14 | cdef inline np.float32_t min(np.float32_t a, np.float32_t b):
15 | return a if a <= b else b
16 |
17 | cdef inline np.float32_t abs(np.float32_t a, np.float32_t b):
18 | return a - b if a >= b else b - a
19 |
20 | def cpu_nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh):
21 | cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:, 0]
22 | cdef np.ndarray[np.float32_t, ndim=1] y1 = dets[:, 1]
23 | cdef np.ndarray[np.float32_t, ndim=1] x2 = dets[:, 2]
24 | cdef np.ndarray[np.float32_t, ndim=1] y2 = dets[:, 3]
25 | cdef np.ndarray[np.float32_t, ndim=1] scores = dets[:, 4]
26 |
27 | cdef np.ndarray[np.float32_t, ndim=1] areas = (x2 - x1 + 1) * (y2 - y1 + 1)
28 | cdef np.ndarray[np.int_t, ndim=1] order = scores.argsort()[::-1]
29 |
30 | cdef int ndets = dets.shape[0]
31 | cdef np.ndarray[np.int_t, ndim=1] suppressed = \
32 | np.zeros((ndets), dtype=np.int)
33 |
34 | # nominal indices
35 | cdef int _i, _j
36 | # sorted indices
37 | cdef int i, j
38 | # temp variables for box i's (the box currently under consideration)
39 | cdef np.float32_t ix1, iy1, ix2, iy2, iarea
40 | # variables for computing overlap with box j (lower scoring box)
41 | cdef np.float32_t xx1, yy1, xx2, yy2
42 | cdef np.float32_t w, h
43 | cdef np.float32_t inter, ovr
44 |
45 | keep = []
46 | for _i in range(ndets):
47 | i = order[_i]
48 | if suppressed[i] == 1:
49 | continue
50 | keep.append(i)
51 | ix1 = x1[i]
52 | iy1 = y1[i]
53 | ix2 = x2[i]
54 | iy2 = y2[i]
55 | iarea = areas[i]
56 | for _j in range(_i + 1, ndets):
57 | j = order[_j]
58 | if suppressed[j] == 1:
59 | continue
60 | xx1 = max(ix1, x1[j])
61 | yy1 = max(iy1, y1[j])
62 | xx2 = min(ix2, x2[j])
63 | yy2 = min(iy2, y2[j])
64 | w = max(0.0, xx2 - xx1 + 1)
65 | h = max(0.0, yy2 - yy1 + 1)
66 | inter = w * h
67 | ovr = inter / (iarea + areas[j] - inter)
68 | if ovr >= thresh:
69 | suppressed[j] = 1
70 |
71 | return keep
72 |
73 | def cpu_soft_nms(np.ndarray[float, ndim=2] boxes, float sigma=0.5, float Nt=0.3, float threshold=0.001, unsigned int method=0):
74 | cdef unsigned int N = boxes.shape[0]
75 | cdef float iw, ih, box_area
76 | cdef float ua
77 | cdef int pos = 0
78 | cdef float maxscore = 0
79 | cdef int maxpos = 0
80 | cdef float x1,x2,y1,y2,tx1,tx2,ty1,ty2,ts,area,weight,ov
81 |
82 | for i in range(N):
83 | maxscore = boxes[i, 4]
84 | maxpos = i
85 |
86 | tx1 = boxes[i,0]
87 | ty1 = boxes[i,1]
88 | tx2 = boxes[i,2]
89 | ty2 = boxes[i,3]
90 | ts = boxes[i,4]
91 |
92 | pos = i + 1
93 | # get max box
94 | while pos < N:
95 | if maxscore < boxes[pos, 4]:
96 | maxscore = boxes[pos, 4]
97 | maxpos = pos
98 | pos = pos + 1
99 |
100 | # add max box as a detection
101 | boxes[i,0] = boxes[maxpos,0]
102 | boxes[i,1] = boxes[maxpos,1]
103 | boxes[i,2] = boxes[maxpos,2]
104 | boxes[i,3] = boxes[maxpos,3]
105 | boxes[i,4] = boxes[maxpos,4]
106 |
107 | # swap ith box with position of max box
108 | boxes[maxpos,0] = tx1
109 | boxes[maxpos,1] = ty1
110 | boxes[maxpos,2] = tx2
111 | boxes[maxpos,3] = ty2
112 | boxes[maxpos,4] = ts
113 |
114 | tx1 = boxes[i,0]
115 | ty1 = boxes[i,1]
116 | tx2 = boxes[i,2]
117 | ty2 = boxes[i,3]
118 | ts = boxes[i,4]
119 |
120 | pos = i + 1
121 | # NMS iterations, note that N changes if detection boxes fall below threshold
122 | while pos < N:
123 | x1 = boxes[pos, 0]
124 | y1 = boxes[pos, 1]
125 | x2 = boxes[pos, 2]
126 | y2 = boxes[pos, 3]
127 | s = boxes[pos, 4]
128 |
129 | area = (x2 - x1 + 1) * (y2 - y1 + 1)
130 | iw = (min(tx2, x2) - max(tx1, x1) + 1)
131 | if iw > 0:
132 | ih = (min(ty2, y2) - max(ty1, y1) + 1)
133 | if ih > 0:
134 | ua = float((tx2 - tx1 + 1) * (ty2 - ty1 + 1) + area - iw * ih)
135 | ov = iw * ih / ua #iou between max box and detection box
136 |
137 | if method == 1: # linear
138 | if ov > Nt:
139 | weight = 1 - ov
140 | else:
141 | weight = 1
142 | elif method == 2: # gaussian
143 | weight = np.exp(-(ov * ov)/sigma)
144 | else: # original NMS
145 | if ov > Nt:
146 | weight = 0
147 | else:
148 | weight = 1
149 |
150 | boxes[pos, 4] = weight*boxes[pos, 4]
151 |
152 | # if box score falls below threshold, discard the box by swapping with last box
153 | # update N
154 | if boxes[pos, 4] < threshold:
155 | boxes[pos,0] = boxes[N-1, 0]
156 | boxes[pos,1] = boxes[N-1, 1]
157 | boxes[pos,2] = boxes[N-1, 2]
158 | boxes[pos,3] = boxes[N-1, 3]
159 | boxes[pos,4] = boxes[N-1, 4]
160 | N = N - 1
161 | pos = pos - 1
162 |
163 | pos = pos + 1
164 |
165 | keep = [i for i in range(N)]
166 | return keep
167 |
--------------------------------------------------------------------------------
/utils/nms/gpu_nms.hpp:
--------------------------------------------------------------------------------
1 | void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num,
2 | int boxes_dim, float nms_overlap_thresh, int device_id);
3 |
--------------------------------------------------------------------------------
/utils/nms/gpu_nms.pyx:
--------------------------------------------------------------------------------
1 | # --------------------------------------------------------
2 | # Faster R-CNN
3 | # Copyright (c) 2015 Microsoft
4 | # Licensed under The MIT License [see LICENSE for details]
5 | # Written by Ross Girshick
6 | # --------------------------------------------------------
7 |
8 | import numpy as np
9 | cimport numpy as np
10 |
11 | assert sizeof(int) == sizeof(np.int32_t)
12 |
13 | cdef extern from "gpu_nms.hpp":
14 | void _nms(np.int32_t*, int*, np.float32_t*, int, int, float, int)
15 |
16 | def gpu_nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh,
17 | np.int32_t device_id=0):
18 | cdef int boxes_num = dets.shape[0]
19 | cdef int boxes_dim = dets.shape[1]
20 | cdef int num_out
21 | cdef np.ndarray[np.int32_t, ndim=1] \
22 | keep = np.zeros(boxes_num, dtype=np.int32)
23 | cdef np.ndarray[np.float32_t, ndim=1] \
24 | scores = dets[:, 4]
25 | cdef np.ndarray[np.int_t, ndim=1] \
26 | order = scores.argsort()[::-1]
27 | cdef np.ndarray[np.float32_t, ndim=2] \
28 | sorted_dets = dets[order, :]
29 | _nms(&keep[0], &num_out, &sorted_dets[0, 0], boxes_num, boxes_dim, thresh, device_id)
30 | keep = keep[:num_out]
31 | return list(order[keep])
32 |
--------------------------------------------------------------------------------
/utils/nms/nms_kernel.cu:
--------------------------------------------------------------------------------
1 | // ------------------------------------------------------------------
2 | // Faster R-CNN
3 | // Copyright (c) 2015 Microsoft
4 | // Licensed under The MIT License [see fast-rcnn/LICENSE for details]
5 | // Written by Shaoqing Ren
6 | // ------------------------------------------------------------------
7 |
8 | #include "gpu_nms.hpp"
9 | #include
10 | #include
11 |
12 | #define CUDA_CHECK(condition) \
13 | /* Code block avoids redefinition of cudaError_t error */ \
14 | do { \
15 | cudaError_t error = condition; \
16 | if (error != cudaSuccess) { \
17 | std::cout << cudaGetErrorString(error) << std::endl; \
18 | } \
19 | } while (0)
20 |
21 | #define DIVUP(m,n) ((m) / (n) + ((m) % (n) > 0))
22 | int const threadsPerBlock = sizeof(unsigned long long) * 8;
23 |
24 | __device__ inline float devIoU(float const * const a, float const * const b) {
25 | float left = max(a[0], b[0]), right = min(a[2], b[2]);
26 | float top = max(a[1], b[1]), bottom = min(a[3], b[3]);
27 | float width = max(right - left + 1, 0.f), height = max(bottom - top + 1, 0.f);
28 | float interS = width * height;
29 | float Sa = (a[2] - a[0] + 1) * (a[3] - a[1] + 1);
30 | float Sb = (b[2] - b[0] + 1) * (b[3] - b[1] + 1);
31 | return interS / (Sa + Sb - interS);
32 | }
33 |
34 | __global__ void nms_kernel(const int n_boxes, const float nms_overlap_thresh,
35 | const float *dev_boxes, unsigned long long *dev_mask) {
36 | const int row_start = blockIdx.y;
37 | const int col_start = blockIdx.x;
38 |
39 | // if (row_start > col_start) return;
40 |
41 | const int row_size =
42 | min(n_boxes - row_start * threadsPerBlock, threadsPerBlock);
43 | const int col_size =
44 | min(n_boxes - col_start * threadsPerBlock, threadsPerBlock);
45 |
46 | __shared__ float block_boxes[threadsPerBlock * 5];
47 | if (threadIdx.x < col_size) {
48 | block_boxes[threadIdx.x * 5 + 0] =
49 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 0];
50 | block_boxes[threadIdx.x * 5 + 1] =
51 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 1];
52 | block_boxes[threadIdx.x * 5 + 2] =
53 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 2];
54 | block_boxes[threadIdx.x * 5 + 3] =
55 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 3];
56 | block_boxes[threadIdx.x * 5 + 4] =
57 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 4];
58 | }
59 | __syncthreads();
60 |
61 | if (threadIdx.x < row_size) {
62 | const int cur_box_idx = threadsPerBlock * row_start + threadIdx.x;
63 | const float *cur_box = dev_boxes + cur_box_idx * 5;
64 | int i = 0;
65 | unsigned long long t = 0;
66 | int start = 0;
67 | if (row_start == col_start) {
68 | start = threadIdx.x + 1;
69 | }
70 | for (i = start; i < col_size; i++) {
71 | if (devIoU(cur_box, block_boxes + i * 5) > nms_overlap_thresh) {
72 | t |= 1ULL << i;
73 | }
74 | }
75 | const int col_blocks = DIVUP(n_boxes, threadsPerBlock);
76 | dev_mask[cur_box_idx * col_blocks + col_start] = t;
77 | }
78 | }
79 |
80 | void _set_device(int device_id) {
81 | int current_device;
82 | CUDA_CHECK(cudaGetDevice(¤t_device));
83 | if (current_device == device_id) {
84 | return;
85 | }
86 | // The call to cudaSetDevice must come before any calls to Get, which
87 | // may perform initialization using the GPU.
88 | CUDA_CHECK(cudaSetDevice(device_id));
89 | }
90 |
91 | void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num,
92 | int boxes_dim, float nms_overlap_thresh, int device_id) {
93 | _set_device(device_id);
94 |
95 | float* boxes_dev = NULL;
96 | unsigned long long* mask_dev = NULL;
97 |
98 | const int col_blocks = DIVUP(boxes_num, threadsPerBlock);
99 |
100 | CUDA_CHECK(cudaMalloc(&boxes_dev,
101 | boxes_num * boxes_dim * sizeof(float)));
102 | CUDA_CHECK(cudaMemcpy(boxes_dev,
103 | boxes_host,
104 | boxes_num * boxes_dim * sizeof(float),
105 | cudaMemcpyHostToDevice));
106 |
107 | CUDA_CHECK(cudaMalloc(&mask_dev,
108 | boxes_num * col_blocks * sizeof(unsigned long long)));
109 |
110 | dim3 blocks(DIVUP(boxes_num, threadsPerBlock),
111 | DIVUP(boxes_num, threadsPerBlock));
112 | dim3 threads(threadsPerBlock);
113 | nms_kernel<<>>(boxes_num,
114 | nms_overlap_thresh,
115 | boxes_dev,
116 | mask_dev);
117 |
118 | std::vector mask_host(boxes_num * col_blocks);
119 | CUDA_CHECK(cudaMemcpy(&mask_host[0],
120 | mask_dev,
121 | sizeof(unsigned long long) * boxes_num * col_blocks,
122 | cudaMemcpyDeviceToHost));
123 |
124 | std::vector remv(col_blocks);
125 | memset(&remv[0], 0, sizeof(unsigned long long) * col_blocks);
126 |
127 | int num_to_keep = 0;
128 | for (int i = 0; i < boxes_num; i++) {
129 | int nblock = i / threadsPerBlock;
130 | int inblock = i % threadsPerBlock;
131 |
132 | if (!(remv[nblock] & (1ULL << inblock))) {
133 | keep_out[num_to_keep++] = i;
134 | unsigned long long *p = &mask_host[0] + i * col_blocks;
135 | for (int j = nblock; j < col_blocks; j++) {
136 | remv[j] |= p[j];
137 | }
138 | }
139 | }
140 | *num_out = num_to_keep;
141 |
142 | CUDA_CHECK(cudaFree(boxes_dev));
143 | CUDA_CHECK(cudaFree(mask_dev));
144 | }
145 |
--------------------------------------------------------------------------------
/utils/nms/py_cpu_nms.py:
--------------------------------------------------------------------------------
1 | # --------------------------------------------------------
2 | # Fast R-CNN
3 | # Copyright (c) 2015 Microsoft
4 | # Licensed under The MIT License [see LICENSE for details]
5 | # Written by Ross Girshick
6 | # --------------------------------------------------------
7 |
8 | import numpy as np
9 |
10 | def py_cpu_nms(dets, thresh):
11 | """Pure Python NMS baseline."""
12 | x1 = dets[:, 0]
13 | y1 = dets[:, 1]
14 | x2 = dets[:, 2]
15 | y2 = dets[:, 3]
16 | scores = dets[:, 4]
17 |
18 | areas = (x2 - x1 + 1) * (y2 - y1 + 1)
19 | order = scores.argsort()[::-1]
20 |
21 | keep = []
22 | while order.size > 0:
23 | i = order[0]
24 | keep.append(i)
25 | xx1 = np.maximum(x1[i], x1[order[1:]])
26 | yy1 = np.maximum(y1[i], y1[order[1:]])
27 | xx2 = np.minimum(x2[i], x2[order[1:]])
28 | yy2 = np.minimum(y2[i], y2[order[1:]])
29 |
30 | w = np.maximum(0.0, xx2 - xx1 + 1)
31 | h = np.maximum(0.0, yy2 - yy1 + 1)
32 | inter = w * h
33 | ovr = inter / (areas[i] + areas[order[1:]] - inter)
34 |
35 | inds = np.where(ovr <= thresh)[0]
36 | order = order[inds + 1]
37 |
38 | return keep
39 |
--------------------------------------------------------------------------------
/utils/nms_wrapper.py:
--------------------------------------------------------------------------------
1 | # --------------------------------------------------------
2 | # Fast R-CNN
3 | # Copyright (c) 2015 Microsoft
4 | # Licensed under The MIT License [see LICENSE for details]
5 | # Written by Ross Girshick
6 | # --------------------------------------------------------
7 |
8 | from .nms.cpu_nms import cpu_nms, cpu_soft_nms
9 | from .nms.gpu_nms import gpu_nms
10 |
11 | def nms(dets, thresh, force_cpu=False):
12 | """Dispatch to either CPU or GPU NMS implementations."""
13 |
14 | if dets.shape[0] == 0:
15 | return []
16 | if force_cpu:
17 | #return cpu_soft_nms(dets, thresh, method = 0)
18 | return cpu_nms(dets, thresh)
19 | return gpu_nms(dets, thresh)
20 |
21 | def soft_nms(dets, Nt=0.3, sigma=0.5, thresh=0.001, method=1):
22 | """Dispatch to either CPU or GPU NMS implementations."""
23 |
24 | if dets.shape[0] == 0:
25 | return []
26 | return cpu_soft_nms(dets, sigma, Nt, thresh, method)
--------------------------------------------------------------------------------
/utils/preprocess.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 |
3 | import torch
4 | import torch.nn as nn
5 | import torch.nn.functional as F
6 | from torch.autograd import Variable
7 | import numpy as np
8 | import cv2
9 |
10 |
11 | def letterbox_image(img, resize_wh):
12 | '''resize image with unchanged aspect ratio using padding'''
13 | img_w, img_h = img.shape[1], img.shape[0]
14 | w, h = resize_wh
15 | new_w = int(img_w * min(w/img_w, h/img_h))
16 | new_h = int(img_h * min(w/img_w, h/img_h))
17 | resized_image = cv2.resize(img, (new_w, new_h), interpolation = cv2.INTER_CUBIC)
18 |
19 | canvas = np.full((resize_wh[1], resize_wh[0], 3), 128)
20 |
21 | canvas[(h-new_h)//2:(h-new_h)//2 + new_h,(w-new_w)//2:(w-new_w)//2 + new_w, :] = resized_image
22 |
23 | return canvas
24 |
25 | def preproc_for_test(img, resize_wh, use_pad=False):
26 | if not use_pad:
27 | img = cv2.resize(img, resize_wh)
28 | else:
29 | img = letterbox_image(img, resize_wh)
30 | img_ = img[:,:,::-1].transpose((2,0,1)).copy()
31 | img_ = torch.from_numpy(img_).float().div(255.0).unsqueeze(0)
32 | return img_
33 |
34 |
--------------------------------------------------------------------------------
/utils/pycocotools/__init__.py:
--------------------------------------------------------------------------------
1 | __author__ = 'tylin'
2 |
--------------------------------------------------------------------------------
/utils/pycocotools/_mask.pyx:
--------------------------------------------------------------------------------
1 | # distutils: language = c
2 | # distutils: sources = ../common/maskApi.c
3 |
4 | #**************************************************************************
5 | # Microsoft COCO Toolbox. version 2.0
6 | # Data, paper, and tutorials available at: http://mscoco.org/
7 | # Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
8 | # Licensed under the Simplified BSD License [see coco/license.txt]
9 | #**************************************************************************
10 |
11 | __author__ = 'tsungyi'
12 |
13 | import sys
14 | PYTHON_VERSION = sys.version_info[0]
15 |
16 | # import both Python-level and C-level symbols of Numpy
17 | # the API uses Numpy to interface C and Python
18 | import numpy as np
19 | cimport numpy as np
20 | from libc.stdlib cimport malloc, free
21 |
22 | # intialized Numpy. must do.
23 | np.import_array()
24 |
25 | # import numpy C function
26 | # we use PyArray_ENABLEFLAGS to make Numpy ndarray responsible to memoery management
27 | cdef extern from "numpy/arrayobject.h":
28 | void PyArray_ENABLEFLAGS(np.ndarray arr, int flags)
29 |
30 | # Declare the prototype of the C functions in MaskApi.h
31 | cdef extern from "maskApi.h":
32 | ctypedef unsigned int uint
33 | ctypedef unsigned long siz
34 | ctypedef unsigned char byte
35 | ctypedef double* BB
36 | ctypedef struct RLE:
37 | siz h,
38 | siz w,
39 | siz m,
40 | uint* cnts,
41 | void rlesInit( RLE **R, siz n )
42 | void rleEncode( RLE *R, const byte *M, siz h, siz w, siz n )
43 | void rleDecode( const RLE *R, byte *mask, siz n )
44 | void rleMerge( const RLE *R, RLE *M, siz n, int intersect )
45 | void rleArea( const RLE *R, siz n, uint *a )
46 | void rleIou( RLE *dt, RLE *gt, siz m, siz n, byte *iscrowd, double *o )
47 | void bbIou( BB dt, BB gt, siz m, siz n, byte *iscrowd, double *o )
48 | void rleToBbox( const RLE *R, BB bb, siz n )
49 | void rleFrBbox( RLE *R, const BB bb, siz h, siz w, siz n )
50 | void rleFrPoly( RLE *R, const double *xy, siz k, siz h, siz w )
51 | char* rleToString( const RLE *R )
52 | void rleFrString( RLE *R, char *s, siz h, siz w )
53 |
54 | # python class to wrap RLE array in C
55 | # the class handles the memory allocation and deallocation
56 | cdef class RLEs:
57 | cdef RLE *_R
58 | cdef siz _n
59 |
60 | def __cinit__(self, siz n =0):
61 | rlesInit(&self._R, n)
62 | self._n = n
63 |
64 | # free the RLE array here
65 | def __dealloc__(self):
66 | if self._R is not NULL:
67 | for i in range(self._n):
68 | free(self._R[i].cnts)
69 | free(self._R)
70 | def __getattr__(self, key):
71 | if key == 'n':
72 | return self._n
73 | raise AttributeError(key)
74 |
75 | # python class to wrap Mask array in C
76 | # the class handles the memory allocation and deallocation
77 | cdef class Masks:
78 | cdef byte *_mask
79 | cdef siz _h
80 | cdef siz _w
81 | cdef siz _n
82 |
83 | def __cinit__(self, h, w, n):
84 | self._mask = malloc(h*w*n* sizeof(byte))
85 | self._h = h
86 | self._w = w
87 | self._n = n
88 | # def __dealloc__(self):
89 | # the memory management of _mask has been passed to np.ndarray
90 | # it doesn't need to be freed here
91 |
92 | # called when passing into np.array() and return an np.ndarray in column-major order
93 | def __array__(self):
94 | cdef np.npy_intp shape[1]
95 | shape[0] = self._h*self._w*self._n
96 | # Create a 1D array, and reshape it to fortran/Matlab column-major array
97 | ndarray = np.PyArray_SimpleNewFromData(1, shape, np.NPY_UINT8, self._mask).reshape((self._h, self._w, self._n), order='F')
98 | # The _mask allocated by Masks is now handled by ndarray
99 | PyArray_ENABLEFLAGS(ndarray, np.NPY_OWNDATA)
100 | return ndarray
101 |
102 | # internal conversion from Python RLEs object to compressed RLE format
103 | def _toString(RLEs Rs):
104 | cdef siz n = Rs.n
105 | cdef bytes py_string
106 | cdef char* c_string
107 | objs = []
108 | for i in range(n):
109 | c_string = rleToString( &Rs._R[i] )
110 | py_string = c_string
111 | objs.append({
112 | 'size': [Rs._R[i].h, Rs._R[i].w],
113 | 'counts': py_string
114 | })
115 | free(c_string)
116 | return objs
117 |
118 | # internal conversion from compressed RLE format to Python RLEs object
119 | def _frString(rleObjs):
120 | cdef siz n = len(rleObjs)
121 | Rs = RLEs(n)
122 | cdef bytes py_string
123 | cdef char* c_string
124 | for i, obj in enumerate(rleObjs):
125 | if PYTHON_VERSION == 2:
126 | py_string = str(obj['counts']).encode('utf8')
127 | elif PYTHON_VERSION == 3:
128 | py_string = str.encode(obj['counts']) if type(obj['counts']) == str else obj['counts']
129 | else:
130 | raise Exception('Python version must be 2 or 3')
131 | c_string = py_string
132 | rleFrString( &Rs._R[i], c_string, obj['size'][0], obj['size'][1] )
133 | return Rs
134 |
135 | # encode mask to RLEs objects
136 | # list of RLE string can be generated by RLEs member function
137 | def encode(np.ndarray[np.uint8_t, ndim=3, mode='fortran'] mask):
138 | h, w, n = mask.shape[0], mask.shape[1], mask.shape[2]
139 | cdef RLEs Rs = RLEs(n)
140 | rleEncode(Rs._R,mask.data,h,w,n)
141 | objs = _toString(Rs)
142 | return objs
143 |
144 | # decode mask from compressed list of RLE string or RLEs object
145 | def decode(rleObjs):
146 | cdef RLEs Rs = _frString(rleObjs)
147 | h, w, n = Rs._R[0].h, Rs._R[0].w, Rs._n
148 | masks = Masks(h, w, n)
149 | rleDecode(Rs._R, masks._mask, n);
150 | return np.array(masks)
151 |
152 | def merge(rleObjs, intersect=0):
153 | cdef RLEs Rs = _frString(rleObjs)
154 | cdef RLEs R = RLEs(1)
155 | rleMerge(Rs._R, R._R, Rs._n, intersect)
156 | obj = _toString(R)[0]
157 | return obj
158 |
159 | def area(rleObjs):
160 | cdef RLEs Rs = _frString(rleObjs)
161 | cdef uint* _a = malloc(Rs._n* sizeof(uint))
162 | rleArea(Rs._R, Rs._n, _a)
163 | cdef np.npy_intp shape[1]
164 | shape[0] = Rs._n
165 | a = np.array((Rs._n, ), dtype=np.uint8)
166 | a = np.PyArray_SimpleNewFromData(1, shape, np.NPY_UINT32, _a)
167 | PyArray_ENABLEFLAGS(a, np.NPY_OWNDATA)
168 | return a
169 |
170 | # iou computation. support function overload (RLEs-RLEs and bbox-bbox).
171 | def iou( dt, gt, pyiscrowd ):
172 | def _preproc(objs):
173 | if len(objs) == 0:
174 | return objs
175 | if type(objs) == np.ndarray:
176 | if len(objs.shape) == 1:
177 | objs = objs.reshape((objs[0], 1))
178 | # check if it's Nx4 bbox
179 | if not len(objs.shape) == 2 or not objs.shape[1] == 4:
180 | raise Exception('numpy ndarray input is only for *bounding boxes* and should have Nx4 dimension')
181 | objs = objs.astype(np.double)
182 | elif type(objs) == list:
183 | # check if list is in box format and convert it to np.ndarray
184 | isbox = np.all(np.array([(len(obj)==4) and ((type(obj)==list) or (type(obj)==np.ndarray)) for obj in objs]))
185 | isrle = np.all(np.array([type(obj) == dict for obj in objs]))
186 | if isbox:
187 | objs = np.array(objs, dtype=np.double)
188 | if len(objs.shape) == 1:
189 | objs = objs.reshape((1,objs.shape[0]))
190 | elif isrle:
191 | objs = _frString(objs)
192 | else:
193 | raise Exception('list input can be bounding box (Nx4) or RLEs ([RLE])')
194 | else:
195 | raise Exception('unrecognized type. The following type: RLEs (rle), np.ndarray (box), and list (box) are supported.')
196 | return objs
197 | def _rleIou(RLEs dt, RLEs gt, np.ndarray[np.uint8_t, ndim=1] iscrowd, siz m, siz n, np.ndarray[np.double_t, ndim=1] _iou):
198 | rleIou( dt._R, gt._R, m, n, iscrowd.data, _iou.data )
199 | def _bbIou(np.ndarray[np.double_t, ndim=2] dt, np.ndarray[np.double_t, ndim=2] gt, np.ndarray[np.uint8_t, ndim=1] iscrowd, siz m, siz n, np.ndarray[np.double_t, ndim=1] _iou):
200 | bbIou( dt.data, gt.data, m, n, iscrowd.data, _iou.data )
201 | def _len(obj):
202 | cdef siz N = 0
203 | if type(obj) == RLEs:
204 | N = obj.n
205 | elif len(obj)==0:
206 | pass
207 | elif type(obj) == np.ndarray:
208 | N = obj.shape[0]
209 | return N
210 | # convert iscrowd to numpy array
211 | cdef np.ndarray[np.uint8_t, ndim=1] iscrowd = np.array(pyiscrowd, dtype=np.uint8)
212 | # simple type checking
213 | cdef siz m, n
214 | dt = _preproc(dt)
215 | gt = _preproc(gt)
216 | m = _len(dt)
217 | n = _len(gt)
218 | if m == 0 or n == 0:
219 | return []
220 | if not type(dt) == type(gt):
221 | raise Exception('The dt and gt should have the same data type, either RLEs, list or np.ndarray')
222 |
223 | # define local variables
224 | cdef double* _iou = 0
225 | cdef np.npy_intp shape[1]
226 | # check type and assign iou function
227 | if type(dt) == RLEs:
228 | _iouFun = _rleIou
229 | elif type(dt) == np.ndarray:
230 | _iouFun = _bbIou
231 | else:
232 | raise Exception('input data type not allowed.')
233 | _iou = malloc(m*n* sizeof(double))
234 | iou = np.zeros((m*n, ), dtype=np.double)
235 | shape[0] = m*n
236 | iou = np.PyArray_SimpleNewFromData(1, shape, np.NPY_DOUBLE, _iou)
237 | PyArray_ENABLEFLAGS(iou, np.NPY_OWNDATA)
238 | _iouFun(dt, gt, iscrowd, m, n, iou)
239 | return iou.reshape((m,n), order='F')
240 |
241 | def toBbox( rleObjs ):
242 | cdef RLEs Rs = _frString(rleObjs)
243 | cdef siz n = Rs.n
244 | cdef BB _bb = malloc(4*n* sizeof(double))
245 | rleToBbox( Rs._R, _bb, n )
246 | cdef np.npy_intp shape[1]
247 | shape[0] = 4*n
248 | bb = np.array((1,4*n), dtype=np.double)
249 | bb = np.PyArray_SimpleNewFromData(1, shape, np.NPY_DOUBLE, _bb).reshape((n, 4))
250 | PyArray_ENABLEFLAGS(bb, np.NPY_OWNDATA)
251 | return bb
252 |
253 | def frBbox(np.ndarray[np.double_t, ndim=2] bb, siz h, siz w ):
254 | cdef siz n = bb.shape[0]
255 | Rs = RLEs(n)
256 | rleFrBbox( Rs._R, bb.data, h, w, n )
257 | objs = _toString(Rs)
258 | return objs
259 |
260 | def frPoly( poly, siz h, siz w ):
261 | cdef np.ndarray[np.double_t, ndim=1] np_poly
262 | n = len(poly)
263 | Rs = RLEs(n)
264 | for i, p in enumerate(poly):
265 | np_poly = np.array(p, dtype=np.double, order='F')
266 | rleFrPoly( &Rs._R[i], np_poly.data, int(len(p)/2), h, w )
267 | objs = _toString(Rs)
268 | return objs
269 |
270 | def frUncompressedRLE(ucRles, siz h, siz w):
271 | cdef np.ndarray[np.uint32_t, ndim=1] cnts
272 | cdef RLE R
273 | cdef uint *data
274 | n = len(ucRles)
275 | objs = []
276 | for i in range(n):
277 | Rs = RLEs(1)
278 | cnts = np.array(ucRles[i]['counts'], dtype=np.uint32)
279 | # time for malloc can be saved here but it's fine
280 | data = malloc(len(cnts)* sizeof(uint))
281 | for j in range(len(cnts)):
282 | data[j] = cnts[j]
283 | R = RLE(ucRles[i]['size'][0], ucRles[i]['size'][1], len(cnts), data)
284 | Rs._R[0] = R
285 | objs.append(_toString(Rs)[0])
286 | return objs
287 |
288 | def frPyObjects(pyobj, h, w):
289 | # encode rle from a list of python objects
290 | if type(pyobj) == np.ndarray:
291 | objs = frBbox(pyobj, h, w)
292 | elif type(pyobj) == list and len(pyobj[0]) == 4:
293 | objs = frBbox(pyobj, h, w)
294 | elif type(pyobj) == list and len(pyobj[0]) > 4:
295 | objs = frPoly(pyobj, h, w)
296 | elif type(pyobj) == list and type(pyobj[0]) == dict \
297 | and 'counts' in pyobj[0] and 'size' in pyobj[0]:
298 | objs = frUncompressedRLE(pyobj, h, w)
299 | # encode rle from single python object
300 | elif type(pyobj) == list and len(pyobj) == 4:
301 | objs = frBbox([pyobj], h, w)[0]
302 | elif type(pyobj) == list and len(pyobj) > 4:
303 | objs = frPoly([pyobj], h, w)[0]
304 | elif type(pyobj) == dict and 'counts' in pyobj and 'size' in pyobj:
305 | objs = frUncompressedRLE([pyobj], h, w)[0]
306 | else:
307 | raise Exception('input type is not supported.')
308 | return objs
309 |
--------------------------------------------------------------------------------
/utils/pycocotools/coco.py:
--------------------------------------------------------------------------------
1 | __author__ = 'tylin'
2 | __version__ = '2.0'
3 | # Interface for accessing the Microsoft COCO dataset.
4 |
5 | # Microsoft COCO is a large image dataset designed for object detection,
6 | # segmentation, and caption generation. pycocotools is a Python API that
7 | # assists in loading, parsing and visualizing the annotations in COCO.
8 | # Please visit http://mscoco.org/ for more information on COCO, including
9 | # for the data, paper, and tutorials. The exact format of the annotations
10 | # is also described on the COCO website. For example usage of the pycocotools
11 | # please see pycocotools_demo.ipynb. In addition to this API, please download both
12 | # the COCO images and annotations in order to run the demo.
13 |
14 | # An alternative to using the API is to load the annotations directly
15 | # into Python dictionary
16 | # Using the API provides additional utility functions. Note that this API
17 | # supports both *instance* and *caption* annotations. In the case of
18 | # captions not all functions are defined (e.g. categories are undefined).
19 |
20 | # The following API functions are defined:
21 | # COCO - COCO api class that loads COCO annotation file and prepare data structures.
22 | # decodeMask - Decode binary mask M encoded via run-length encoding.
23 | # encodeMask - Encode binary mask M using run-length encoding.
24 | # getAnnIds - Get ann ids that satisfy given filter conditions.
25 | # getCatIds - Get cat ids that satisfy given filter conditions.
26 | # getImgIds - Get img ids that satisfy given filter conditions.
27 | # loadAnns - Load anns with the specified ids.
28 | # loadCats - Load cats with the specified ids.
29 | # loadImgs - Load imgs with the specified ids.
30 | # annToMask - Convert segmentation in an annotation to binary mask.
31 | # showAnns - Display the specified annotations.
32 | # loadRes - Load algorithm results and create API for accessing them.
33 | # download - Download COCO images from mscoco.org server.
34 | # Throughout the API "ann"=annotation, "cat"=category, and "img"=image.
35 | # Help on each functions can be accessed by: "help COCO>function".
36 |
37 | # See also COCO>decodeMask,
38 | # COCO>encodeMask, COCO>getAnnIds, COCO>getCatIds,
39 | # COCO>getImgIds, COCO>loadAnns, COCO>loadCats,
40 | # COCO>loadImgs, COCO>annToMask, COCO>showAnns
41 |
42 | # Microsoft COCO Toolbox. version 2.0
43 | # Data, paper, and tutorials available at: http://mscoco.org/
44 | # Code written by Piotr Dollar and Tsung-Yi Lin, 2014.
45 | # Licensed under the Simplified BSD License [see bsd.txt]
46 |
47 | import json
48 | import time
49 | import matplotlib.pyplot as plt
50 | from matplotlib.collections import PatchCollection
51 | from matplotlib.patches import Polygon
52 | import numpy as np
53 | import copy
54 | import itertools
55 | from . import mask as maskUtils
56 | import os
57 | from collections import defaultdict
58 | import sys
59 | PYTHON_VERSION = sys.version_info[0]
60 | if PYTHON_VERSION == 2:
61 | from urllib import urlretrieve
62 | elif PYTHON_VERSION == 3:
63 | from urllib.request import urlretrieve
64 |
65 | class COCO:
66 | def __init__(self, annotation_file=None):
67 | """
68 | Constructor of Microsoft COCO helper class for reading and visualizing annotations.
69 | :param annotation_file (str): location of annotation file
70 | :param image_folder (str): location to the folder that hosts images.
71 | :return:
72 | """
73 | # load dataset
74 | self.dataset,self.anns,self.cats,self.imgs = dict(),dict(),dict(),dict()
75 | self.imgToAnns, self.catToImgs = defaultdict(list), defaultdict(list)
76 | if not annotation_file == None:
77 | print('loading annotations into memory...')
78 | tic = time.time()
79 | dataset = json.load(open(annotation_file, 'r'))
80 | assert type(dataset)==dict, 'annotation file format {} not supported'.format(type(dataset))
81 | print('Done (t={:0.2f}s)'.format(time.time()- tic))
82 | self.dataset = dataset
83 | self.createIndex()
84 |
85 | def createIndex(self):
86 | # create index
87 | print('creating index...')
88 | anns, cats, imgs = {}, {}, {}
89 | imgToAnns,catToImgs = defaultdict(list),defaultdict(list)
90 | if 'annotations' in self.dataset:
91 | for ann in self.dataset['annotations']:
92 | imgToAnns[ann['image_id']].append(ann)
93 | anns[ann['id']] = ann
94 |
95 | if 'images' in self.dataset:
96 | for img in self.dataset['images']:
97 | imgs[img['id']] = img
98 |
99 | if 'categories' in self.dataset:
100 | for cat in self.dataset['categories']:
101 | cats[cat['id']] = cat
102 |
103 | if 'annotations' in self.dataset and 'categories' in self.dataset:
104 | for ann in self.dataset['annotations']:
105 | catToImgs[ann['category_id']].append(ann['image_id'])
106 |
107 | print('index created!')
108 |
109 | # create class members
110 | self.anns = anns
111 | self.imgToAnns = imgToAnns
112 | self.catToImgs = catToImgs
113 | self.imgs = imgs
114 | self.cats = cats
115 |
116 | def info(self):
117 | """
118 | Print information about the annotation file.
119 | :return:
120 | """
121 | for key, value in self.dataset['info'].items():
122 | print('{}: {}'.format(key, value))
123 |
124 | def getAnnIds(self, imgIds=[], catIds=[], areaRng=[], iscrowd=None):
125 | """
126 | Get ann ids that satisfy given filter conditions. default skips that filter
127 | :param imgIds (int array) : get anns for given imgs
128 | catIds (int array) : get anns for given cats
129 | areaRng (float array) : get anns for given area range (e.g. [0 inf])
130 | iscrowd (boolean) : get anns for given crowd label (False or True)
131 | :return: ids (int array) : integer array of ann ids
132 | """
133 | imgIds = imgIds if type(imgIds) == list else [imgIds]
134 | catIds = catIds if type(catIds) == list else [catIds]
135 |
136 | if len(imgIds) == len(catIds) == len(areaRng) == 0:
137 | anns = self.dataset['annotations']
138 | else:
139 | if not len(imgIds) == 0:
140 | lists = [self.imgToAnns[imgId] for imgId in imgIds if imgId in self.imgToAnns]
141 | anns = list(itertools.chain.from_iterable(lists))
142 | else:
143 | anns = self.dataset['annotations']
144 | anns = anns if len(catIds) == 0 else [ann for ann in anns if ann['category_id'] in catIds]
145 | anns = anns if len(areaRng) == 0 else [ann for ann in anns if ann['area'] > areaRng[0] and ann['area'] < areaRng[1]]
146 | if not iscrowd == None:
147 | ids = [ann['id'] for ann in anns if ann['iscrowd'] == iscrowd]
148 | else:
149 | ids = [ann['id'] for ann in anns]
150 | return ids
151 |
152 | def getCatIds(self, catNms=[], supNms=[], catIds=[]):
153 | """
154 | filtering parameters. default skips that filter.
155 | :param catNms (str array) : get cats for given cat names
156 | :param supNms (str array) : get cats for given supercategory names
157 | :param catIds (int array) : get cats for given cat ids
158 | :return: ids (int array) : integer array of cat ids
159 | """
160 | catNms = catNms if type(catNms) == list else [catNms]
161 | supNms = supNms if type(supNms) == list else [supNms]
162 | catIds = catIds if type(catIds) == list else [catIds]
163 |
164 | if len(catNms) == len(supNms) == len(catIds) == 0:
165 | cats = self.dataset['categories']
166 | else:
167 | cats = self.dataset['categories']
168 | cats = cats if len(catNms) == 0 else [cat for cat in cats if cat['name'] in catNms]
169 | cats = cats if len(supNms) == 0 else [cat for cat in cats if cat['supercategory'] in supNms]
170 | cats = cats if len(catIds) == 0 else [cat for cat in cats if cat['id'] in catIds]
171 | ids = [cat['id'] for cat in cats]
172 | return ids
173 |
174 | def getImgIds(self, imgIds=[], catIds=[]):
175 | '''
176 | Get img ids that satisfy given filter conditions.
177 | :param imgIds (int array) : get imgs for given ids
178 | :param catIds (int array) : get imgs with all given cats
179 | :return: ids (int array) : integer array of img ids
180 | '''
181 | imgIds = imgIds if type(imgIds) == list else [imgIds]
182 | catIds = catIds if type(catIds) == list else [catIds]
183 |
184 | if len(imgIds) == len(catIds) == 0:
185 | ids = self.imgs.keys()
186 | else:
187 | ids = set(imgIds)
188 | for i, catId in enumerate(catIds):
189 | if i == 0 and len(ids) == 0:
190 | ids = set(self.catToImgs[catId])
191 | else:
192 | ids &= set(self.catToImgs[catId])
193 | return list(ids)
194 |
195 | def loadAnns(self, ids=[]):
196 | """
197 | Load anns with the specified ids.
198 | :param ids (int array) : integer ids specifying anns
199 | :return: anns (object array) : loaded ann objects
200 | """
201 | if type(ids) == list:
202 | return [self.anns[id] for id in ids]
203 | elif type(ids) == int:
204 | return [self.anns[ids]]
205 |
206 | def loadCats(self, ids=[]):
207 | """
208 | Load cats with the specified ids.
209 | :param ids (int array) : integer ids specifying cats
210 | :return: cats (object array) : loaded cat objects
211 | """
212 | if type(ids) == list:
213 | return [self.cats[id] for id in ids]
214 | elif type(ids) == int:
215 | return [self.cats[ids]]
216 |
217 | def loadImgs(self, ids=[]):
218 | """
219 | Load anns with the specified ids.
220 | :param ids (int array) : integer ids specifying img
221 | :return: imgs (object array) : loaded img objects
222 | """
223 | if type(ids) == list:
224 | return [self.imgs[id] for id in ids]
225 | elif type(ids) == int:
226 | return [self.imgs[ids]]
227 |
228 | def showAnns(self, anns):
229 | """
230 | Display the specified annotations.
231 | :param anns (array of object): annotations to display
232 | :return: None
233 | """
234 | if len(anns) == 0:
235 | return 0
236 | if 'segmentation' in anns[0] or 'keypoints' in anns[0]:
237 | datasetType = 'instances'
238 | elif 'caption' in anns[0]:
239 | datasetType = 'captions'
240 | else:
241 | raise Exception('datasetType not supported')
242 | if datasetType == 'instances':
243 | ax = plt.gca()
244 | ax.set_autoscale_on(False)
245 | polygons = []
246 | color = []
247 | for ann in anns:
248 | c = (np.random.random((1, 3))*0.6+0.4).tolist()[0]
249 | if 'segmentation' in ann:
250 | if type(ann['segmentation']) == list:
251 | # polygon
252 | for seg in ann['segmentation']:
253 | poly = np.array(seg).reshape((int(len(seg)/2), 2))
254 | polygons.append(Polygon(poly))
255 | color.append(c)
256 | else:
257 | # mask
258 | t = self.imgs[ann['image_id']]
259 | if type(ann['segmentation']['counts']) == list:
260 | rle = maskUtils.frPyObjects([ann['segmentation']], t['height'], t['width'])
261 | else:
262 | rle = [ann['segmentation']]
263 | m = maskUtils.decode(rle)
264 | img = np.ones( (m.shape[0], m.shape[1], 3) )
265 | if ann['iscrowd'] == 1:
266 | color_mask = np.array([2.0,166.0,101.0])/255
267 | if ann['iscrowd'] == 0:
268 | color_mask = np.random.random((1, 3)).tolist()[0]
269 | for i in range(3):
270 | img[:,:,i] = color_mask[i]
271 | ax.imshow(np.dstack( (img, m*0.5) ))
272 | if 'keypoints' in ann and type(ann['keypoints']) == list:
273 | # turn skeleton into zero-based index
274 | sks = np.array(self.loadCats(ann['category_id'])[0]['skeleton'])-1
275 | kp = np.array(ann['keypoints'])
276 | x = kp[0::3]
277 | y = kp[1::3]
278 | v = kp[2::3]
279 | for sk in sks:
280 | if np.all(v[sk]>0):
281 | plt.plot(x[sk],y[sk], linewidth=3, color=c)
282 | plt.plot(x[v>0], y[v>0],'o',markersize=8, markerfacecolor=c, markeredgecolor='k',markeredgewidth=2)
283 | plt.plot(x[v>1], y[v>1],'o',markersize=8, markerfacecolor=c, markeredgecolor=c, markeredgewidth=2)
284 | p = PatchCollection(polygons, facecolor=color, linewidths=0, alpha=0.4)
285 | ax.add_collection(p)
286 | p = PatchCollection(polygons, facecolor='none', edgecolors=color, linewidths=2)
287 | ax.add_collection(p)
288 | elif datasetType == 'captions':
289 | for ann in anns:
290 | print(ann['caption'])
291 |
292 | def loadRes(self, resFile):
293 | """
294 | Load result file and return a result api object.
295 | :param resFile (str) : file name of result file
296 | :return: res (obj) : result api object
297 | """
298 | res = COCO()
299 | res.dataset['images'] = [img for img in self.dataset['images']]
300 |
301 | print('Loading and preparing results...')
302 | tic = time.time()
303 | if type(resFile) == str or type(resFile) == unicode:
304 | anns = json.load(open(resFile))
305 | elif type(resFile) == np.ndarray:
306 | anns = self.loadNumpyAnnotations(resFile)
307 | else:
308 | anns = resFile
309 | assert type(anns) == list, 'results in not an array of objects'
310 | annsImgIds = [ann['image_id'] for ann in anns]
311 | assert set(annsImgIds) == (set(annsImgIds) & set(self.getImgIds())), \
312 | 'Results do not correspond to current coco set'
313 | if 'caption' in anns[0]:
314 | imgIds = set([img['id'] for img in res.dataset['images']]) & set([ann['image_id'] for ann in anns])
315 | res.dataset['images'] = [img for img in res.dataset['images'] if img['id'] in imgIds]
316 | for id, ann in enumerate(anns):
317 | ann['id'] = id+1
318 | elif 'bbox' in anns[0] and not anns[0]['bbox'] == []:
319 | res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
320 | for id, ann in enumerate(anns):
321 | bb = ann['bbox']
322 | x1, x2, y1, y2 = [bb[0], bb[0]+bb[2], bb[1], bb[1]+bb[3]]
323 | if not 'segmentation' in ann:
324 | ann['segmentation'] = [[x1, y1, x1, y2, x2, y2, x2, y1]]
325 | ann['area'] = bb[2]*bb[3]
326 | ann['id'] = id+1
327 | ann['iscrowd'] = 0
328 | elif 'segmentation' in anns[0]:
329 | res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
330 | for id, ann in enumerate(anns):
331 | # now only support compressed RLE format as segmentation results
332 | ann['area'] = maskUtils.area(ann['segmentation'])
333 | if not 'bbox' in ann:
334 | ann['bbox'] = maskUtils.toBbox(ann['segmentation'])
335 | ann['id'] = id+1
336 | ann['iscrowd'] = 0
337 | elif 'keypoints' in anns[0]:
338 | res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
339 | for id, ann in enumerate(anns):
340 | s = ann['keypoints']
341 | x = s[0::3]
342 | y = s[1::3]
343 | x0,x1,y0,y1 = np.min(x), np.max(x), np.min(y), np.max(y)
344 | ann['area'] = (x1-x0)*(y1-y0)
345 | ann['id'] = id + 1
346 | ann['bbox'] = [x0,y0,x1-x0,y1-y0]
347 | print('DONE (t={:0.2f}s)'.format(time.time()- tic))
348 |
349 | res.dataset['annotations'] = anns
350 | res.createIndex()
351 | return res
352 |
353 | def download(self, tarDir = None, imgIds = [] ):
354 | '''
355 | Download COCO images from mscoco.org server.
356 | :param tarDir (str): COCO results directory name
357 | imgIds (list): images to be downloaded
358 | :return:
359 | '''
360 | if tarDir is None:
361 | print('Please specify target directory')
362 | return -1
363 | if len(imgIds) == 0:
364 | imgs = self.imgs.values()
365 | else:
366 | imgs = self.loadImgs(imgIds)
367 | N = len(imgs)
368 | if not os.path.exists(tarDir):
369 | os.makedirs(tarDir)
370 | for i, img in enumerate(imgs):
371 | tic = time.time()
372 | fname = os.path.join(tarDir, img['file_name'])
373 | if not os.path.exists(fname):
374 | urlretrieve(img['coco_url'], fname)
375 | print('downloaded {}/{} images (t={:0.1f}s)'.format(i, N, time.time()- tic))
376 |
377 | def loadNumpyAnnotations(self, data):
378 | """
379 | Convert result data from a numpy array [Nx7] where each row contains {imageID,x1,y1,w,h,score,class}
380 | :param data (numpy.ndarray)
381 | :return: annotations (python nested list)
382 | """
383 | print('Converting ndarray to lists...')
384 | assert(type(data) == np.ndarray)
385 | print(data.shape)
386 | assert(data.shape[1] == 7)
387 | N = data.shape[0]
388 | ann = []
389 | for i in range(N):
390 | if i % 1000000 == 0:
391 | print('{}/{}'.format(i,N))
392 | ann += [{
393 | 'image_id' : int(data[i, 0]),
394 | 'bbox' : [ data[i, 1], data[i, 2], data[i, 3], data[i, 4] ],
395 | 'score' : data[i, 5],
396 | 'category_id': int(data[i, 6]),
397 | }]
398 | return ann
399 |
400 | def annToRLE(self, ann):
401 | """
402 | Convert annotation which can be polygons, uncompressed RLE to RLE.
403 | :return: binary mask (numpy 2D array)
404 | """
405 | t = self.imgs[ann['image_id']]
406 | h, w = t['height'], t['width']
407 | segm = ann['segmentation']
408 | if type(segm) == list:
409 | # polygon -- a single object might consist of multiple parts
410 | # we merge all parts into one mask rle code
411 | rles = maskUtils.frPyObjects(segm, h, w)
412 | rle = maskUtils.merge(rles)
413 | elif type(segm['counts']) == list:
414 | # uncompressed RLE
415 | rle = maskUtils.frPyObjects(segm, h, w)
416 | else:
417 | # rle
418 | rle = ann['segmentation']
419 | return rle
420 |
421 | def annToMask(self, ann):
422 | """
423 | Convert annotation which can be polygons, uncompressed RLE, or RLE to binary mask.
424 | :return: binary mask (numpy 2D array)
425 | """
426 | rle = self.annToRLE(ann)
427 | m = maskUtils.decode(rle)
428 | return m
--------------------------------------------------------------------------------
/utils/pycocotools/mask.py:
--------------------------------------------------------------------------------
1 | __author__ = 'tsungyi'
2 |
3 | #import pycocotools._mask as _mask
4 | from . import _mask
5 |
6 | # Interface for manipulating masks stored in RLE format.
7 | #
8 | # RLE is a simple yet efficient format for storing binary masks. RLE
9 | # first divides a vector (or vectorized image) into a series of piecewise
10 | # constant regions and then for each piece simply stores the length of
11 | # that piece. For example, given M=[0 0 1 1 1 0 1] the RLE counts would
12 | # be [2 3 1 1], or for M=[1 1 1 1 1 1 0] the counts would be [0 6 1]
13 | # (note that the odd counts are always the numbers of zeros). Instead of
14 | # storing the counts directly, additional compression is achieved with a
15 | # variable bitrate representation based on a common scheme called LEB128.
16 | #
17 | # Compression is greatest given large piecewise constant regions.
18 | # Specifically, the size of the RLE is proportional to the number of
19 | # *boundaries* in M (or for an image the number of boundaries in the y
20 | # direction). Assuming fairly simple shapes, the RLE representation is
21 | # O(sqrt(n)) where n is number of pixels in the object. Hence space usage
22 | # is substantially lower, especially for large simple objects (large n).
23 | #
24 | # Many common operations on masks can be computed directly using the RLE
25 | # (without need for decoding). This includes computations such as area,
26 | # union, intersection, etc. All of these operations are linear in the
27 | # size of the RLE, in other words they are O(sqrt(n)) where n is the area
28 | # of the object. Computing these operations on the original mask is O(n).
29 | # Thus, using the RLE can result in substantial computational savings.
30 | #
31 | # The following API functions are defined:
32 | # encode - Encode binary masks using RLE.
33 | # decode - Decode binary masks encoded via RLE.
34 | # merge - Compute union or intersection of encoded masks.
35 | # iou - Compute intersection over union between masks.
36 | # area - Compute area of encoded masks.
37 | # toBbox - Get bounding boxes surrounding encoded masks.
38 | # frPyObjects - Convert polygon, bbox, and uncompressed RLE to encoded RLE mask.
39 | #
40 | # Usage:
41 | # Rs = encode( masks )
42 | # masks = decode( Rs )
43 | # R = merge( Rs, intersect=false )
44 | # o = iou( dt, gt, iscrowd )
45 | # a = area( Rs )
46 | # bbs = toBbox( Rs )
47 | # Rs = frPyObjects( [pyObjects], h, w )
48 | #
49 | # In the API the following formats are used:
50 | # Rs - [dict] Run-length encoding of binary masks
51 | # R - dict Run-length encoding of binary mask
52 | # masks - [hxwxn] Binary mask(s) (must have type np.ndarray(dtype=uint8) in column-major order)
53 | # iscrowd - [nx1] list of np.ndarray. 1 indicates corresponding gt image has crowd region to ignore
54 | # bbs - [nx4] Bounding box(es) stored as [x y w h]
55 | # poly - Polygon stored as [[x1 y1 x2 y2...],[x1 y1 ...],...] (2D list)
56 | # dt,gt - May be either bounding boxes or encoded masks
57 | # Both poly and bbs are 0-indexed (bbox=[0 0 1 1] encloses first pixel).
58 | #
59 | # Finally, a note about the intersection over union (iou) computation.
60 | # The standard iou of a ground truth (gt) and detected (dt) object is
61 | # iou(gt,dt) = area(intersect(gt,dt)) / area(union(gt,dt))
62 | # For "crowd" regions, we use a modified criteria. If a gt object is
63 | # marked as "iscrowd", we allow a dt to match any subregion of the gt.
64 | # Choosing gt' in the crowd gt that best matches the dt can be done using
65 | # gt'=intersect(dt,gt). Since by definition union(gt',dt)=dt, computing
66 | # iou(gt,dt,iscrowd) = iou(gt',dt) = area(intersect(gt,dt)) / area(dt)
67 | # For crowd gt regions we use this modified criteria above for the iou.
68 | #
69 | # To compile run "python setup.py build_ext --inplace"
70 | # Please do not contact us for help with compiling.
71 | #
72 | # Microsoft COCO Toolbox. version 2.0
73 | # Data, paper, and tutorials available at: http://mscoco.org/
74 | # Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
75 | # Licensed under the Simplified BSD License [see coco/license.txt]
76 |
77 | iou = _mask.iou
78 | merge = _mask.merge
79 | frPyObjects = _mask.frPyObjects
80 |
81 | def encode(bimask):
82 | if len(bimask.shape) == 3:
83 | return _mask.encode(bimask)
84 | elif len(bimask.shape) == 2:
85 | h, w = bimask.shape
86 | return _mask.encode(bimask.reshape((h, w, 1), order='F'))[0]
87 |
88 | def decode(rleObjs):
89 | if type(rleObjs) == list:
90 | return _mask.decode(rleObjs)
91 | else:
92 | return _mask.decode([rleObjs])[:,:,0]
93 |
94 | def area(rleObjs):
95 | if type(rleObjs) == list:
96 | return _mask.area(rleObjs)
97 | else:
98 | return _mask.area([rleObjs])[0]
99 |
100 | def toBbox(rleObjs):
101 | if type(rleObjs) == list:
102 | return _mask.toBbox(rleObjs)
103 | else:
104 | return _mask.toBbox([rleObjs])[0]
105 |
--------------------------------------------------------------------------------
/utils/pycocotools/maskApi.c:
--------------------------------------------------------------------------------
1 | /**************************************************************************
2 | * Microsoft COCO Toolbox. version 2.0
3 | * Data, paper, and tutorials available at: http://mscoco.org/
4 | * Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
5 | * Licensed under the Simplified BSD License [see coco/license.txt]
6 | **************************************************************************/
7 | #include "maskApi.h"
8 | #include
9 | #include
10 |
11 | uint umin( uint a, uint b ) { return (ab) ? a : b; }
13 |
14 | void rleInit( RLE *R, siz h, siz w, siz m, uint *cnts ) {
15 | R->h=h; R->w=w; R->m=m; R->cnts=(m==0)?0:malloc(sizeof(uint)*m);
16 | siz j; if(cnts) for(j=0; jcnts[j]=cnts[j];
17 | }
18 |
19 | void rleFree( RLE *R ) {
20 | free(R->cnts); R->cnts=0;
21 | }
22 |
23 | void rlesInit( RLE **R, siz n ) {
24 | siz i; *R = (RLE*) malloc(sizeof(RLE)*n);
25 | for(i=0; i0 ) {
61 | c=umin(ca,cb); cc+=c; ct=0;
62 | ca-=c; if(!ca && a0) {
83 | crowd=iscrowd!=NULL && iscrowd[g];
84 | if(dt[d].h!=gt[g].h || dt[d].w!=gt[g].w) { o[g*m+d]=-1; continue; }
85 | siz ka, kb, a, b; uint c, ca, cb, ct, i, u; int va, vb;
86 | ca=dt[d].cnts[0]; ka=dt[d].m; va=vb=0;
87 | cb=gt[g].cnts[0]; kb=gt[g].m; a=b=1; i=u=0; ct=1;
88 | while( ct>0 ) {
89 | c=umin(ca,cb); if(va||vb) { u+=c; if(va&&vb) i+=c; } ct=0;
90 | ca-=c; if(!ca && athr) keep[j]=0;
105 | }
106 | }
107 | }
108 |
109 | void bbIou( BB dt, BB gt, siz m, siz n, byte *iscrowd, double *o ) {
110 | double h, w, i, u, ga, da; siz g, d; int crowd;
111 | for( g=0; gthr) keep[j]=0;
129 | }
130 | }
131 | }
132 |
133 | void rleToBbox( const RLE *R, BB bb, siz n ) {
134 | siz i; for( i=0; id?1:c=dy && xs>xe) || (dxye);
173 | if(flip) { t=xs; xs=xe; xe=t; t=ys; ys=ye; ye=t; }
174 | s = dx>=dy ? (double)(ye-ys)/dx : (double)(xe-xs)/dy;
175 | if(dx>=dy) for( d=0; d<=dx; d++ ) {
176 | t=flip?dx-d:d; u[m]=t+xs; v[m]=(int)(ys+s*t+.5); m++;
177 | } else for( d=0; d<=dy; d++ ) {
178 | t=flip?dy-d:d; v[m]=t+ys; u[m]=(int)(xs+s*t+.5); m++;
179 | }
180 | }
181 | /* get points along y-boundary and downsample */
182 | free(x); free(y); k=m; m=0; double xd, yd;
183 | x=malloc(sizeof(int)*k); y=malloc(sizeof(int)*k);
184 | for( j=1; jw-1 ) continue;
187 | yd=(double)(v[j]h) yd=h; yd=ceil(yd);
189 | x[m]=(int) xd; y[m]=(int) yd; m++;
190 | }
191 | /* compute rle encoding given y-boundary points */
192 | k=m; a=malloc(sizeof(uint)*(k+1));
193 | for( j=0; j0) b[m++]=a[j++]; else {
199 | j++; if(jm, p=0; long x; int more;
206 | char *s=malloc(sizeof(char)*m*6);
207 | for( i=0; icnts[i]; if(i>2) x-=(long) R->cnts[i-2]; more=1;
209 | while( more ) {
210 | char c=x & 0x1f; x >>= 5; more=(c & 0x10) ? x!=-1 : x!=0;
211 | if(more) c |= 0x20; c+=48; s[p++]=c;
212 | }
213 | }
214 | s[p]=0; return s;
215 | }
216 |
217 | void rleFrString( RLE *R, char *s, siz h, siz w ) {
218 | siz m=0, p=0, k; long x; int more; uint *cnts;
219 | while( s[m] ) m++; cnts=malloc(sizeof(uint)*m); m=0;
220 | while( s[p] ) {
221 | x=0; k=0; more=1;
222 | while( more ) {
223 | char c=s[p]-48; x |= (c & 0x1f) << 5*k;
224 | more = c & 0x20; p++; k++;
225 | if(!more && (c & 0x10)) x |= -1 << 5*k;
226 | }
227 | if(m>2) x+=(long) cnts[m-2]; cnts[m++]=(uint) x;
228 | }
229 | rleInit(R,h,w,m,cnts); free(cnts);
230 | }
231 |
--------------------------------------------------------------------------------
/utils/pycocotools/maskApi.h:
--------------------------------------------------------------------------------
1 | /**************************************************************************
2 | * Microsoft COCO Toolbox. version 2.0
3 | * Data, paper, and tutorials available at: http://mscoco.org/
4 | * Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
5 | * Licensed under the Simplified BSD License [see coco/license.txt]
6 | **************************************************************************/
7 | #pragma once
8 |
9 | typedef unsigned int uint;
10 | typedef unsigned long siz;
11 | typedef unsigned char byte;
12 | typedef double* BB;
13 | typedef struct { siz h, w, m; uint *cnts; } RLE;
14 |
15 | /* Initialize/destroy RLE. */
16 | void rleInit( RLE *R, siz h, siz w, siz m, uint *cnts );
17 | void rleFree( RLE *R );
18 |
19 | /* Initialize/destroy RLE array. */
20 | void rlesInit( RLE **R, siz n );
21 | void rlesFree( RLE **R, siz n );
22 |
23 | /* Encode binary masks using RLE. */
24 | void rleEncode( RLE *R, const byte *mask, siz h, siz w, siz n );
25 |
26 | /* Decode binary masks encoded via RLE. */
27 | void rleDecode( const RLE *R, byte *mask, siz n );
28 |
29 | /* Compute union or intersection of encoded masks. */
30 | void rleMerge( const RLE *R, RLE *M, siz n, int intersect );
31 |
32 | /* Compute area of encoded masks. */
33 | void rleArea( const RLE *R, siz n, uint *a );
34 |
35 | /* Compute intersection over union between masks. */
36 | void rleIou( RLE *dt, RLE *gt, siz m, siz n, byte *iscrowd, double *o );
37 |
38 | /* Compute non-maximum suppression between bounding masks */
39 | void rleNms( RLE *dt, siz n, uint *keep, double thr );
40 |
41 | /* Compute intersection over union between bounding boxes. */
42 | void bbIou( BB dt, BB gt, siz m, siz n, byte *iscrowd, double *o );
43 |
44 | /* Compute non-maximum suppression between bounding boxes */
45 | void bbNms( BB dt, siz n, uint *keep, double thr );
46 |
47 | /* Get bounding boxes surrounding encoded masks. */
48 | void rleToBbox( const RLE *R, BB bb, siz n );
49 |
50 | /* Convert bounding boxes to encoded masks. */
51 | void rleFrBbox( RLE *R, const BB bb, siz h, siz w, siz n );
52 |
53 | /* Convert polygon to encoded mask. */
54 | void rleFrPoly( RLE *R, const double *xy, siz k, siz h, siz w );
55 |
56 | /* Get compressed string representation of encoded mask. */
57 | char* rleToString( const RLE *R );
58 |
59 | /* Convert from compressed string representation of encoded mask. */
60 | void rleFrString( RLE *R, char *s, siz h, siz w );
61 |
--------------------------------------------------------------------------------
/utils/timer.py:
--------------------------------------------------------------------------------
1 | # --------------------------------------------------------
2 | # Fast R-CNN
3 | # Copyright (c) 2015 Microsoft
4 | # Licensed under The MIT License [see LICENSE for details]
5 | # Written by Ross Girshick
6 | # --------------------------------------------------------
7 |
8 | import time
9 |
10 |
11 | class Timer(object):
12 | """A simple timer."""
13 | def __init__(self):
14 | self.total_time = 0.
15 | self.calls = 0
16 | self.start_time = 0.
17 | self.diff = 0.
18 | self.average_time = 0.
19 |
20 | def tic(self):
21 | # using time.time instead of time.clock because time time.clock
22 | # does not normalize for multithreading
23 | self.start_time = time.time()
24 |
25 | def toc(self, average=True):
26 | self.diff = time.time() - self.start_time
27 | self.total_time += self.diff
28 | self.calls += 1
29 | self.average_time = self.total_time / self.calls
30 | if average:
31 | return self.average_time
32 | else:
33 | return self.diff
34 |
35 | def clear(self):
36 | self.total_time = 0.
37 | self.calls = 0
38 | self.start_time = 0.
39 | self.diff = 0.
40 | self.average_time = 0.
41 |
--------------------------------------------------------------------------------