├── LICENSE
├── Original.jpg
├── README.md
├── coco_annotation.py
├── convert.py
├── darknet53.cfg
├── draw_bbox.py
├── font
    ├── FiraMono-Medium.otf
    └── SIL Open Font License.txt
├── kmeans.py
├── model_data
    ├── 5k.txt
    ├── coco_classes.txt
    ├── tiny_yolo_anchors.txt
    ├── voc_classes.txt
    └── yolo_anchors.txt
├── pycocoEval.py
├── train.py
├── train_bottleneck.py
├── voc_annotation.py
├── yolo.py
├── yolo3
    ├── __init__.py
    ├── model.py
    └── utils.py
├── yolo_valid.py
├── yolo_video.py
├── yolov3-tiny.cfg
└── yolov3.cfg


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2018 qqwweee
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/Original.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HulkMaker/tensorflow-keras-yolov3/04a873529e9941a576e7058e47f8991d188ba15b/Original.jpg


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # tensorflow-keras-yolov3
  2 | (cocoapi mAP计算在下方↓↓↓)
  3 | [![license](https://img.shields.io/github/license/mashape/apistatus.svg)](LICENSE)
  4 | 
  5 | ---
  6 | 
  7 | ### Quick Start
  8 | 
  9 | 1. The test environment is
 10 |     - cudatoolkit  9.2
 11 |     - cudnn 7.2.1
 12 |     - Python 3.6.8
 13 |     - Keras 2.2.0
 14 |     - tensorflow 1.10.0
 15 |     - pillow = 5.4.1
 16 |     - matplotlib 3.0.2
 17 | 
 18 | 2. Download YOLOv3 weights from [YOLO website](http://pjreddie.com/darknet/yolo/).
 19 | 3. Convert the Darknet YOLO model to a Keras model .h5 file. 
 20 | 4. Modified default converted model path in yolo.py line26 (default in '/home/common/pretrained_models/yolo.h5')
 21 | 
 22 | ### Run single image detection demo
 23 | ```
 24 | wget https://pjreddie.com/media/files/yolov3.weights
 25 | python convert.py yolov3.cfg yolov3.weights /home/common/pretrained_models/yolo.h5
 26 | python yolo_video.py [OPTIONS...] --image, for image detection mode, OR
 27 | python yolo_video.py [video_path] [output_path (optional)]
 28 | For Tiny YOLOv3, just do in a similar way, just specify model path and anchor path with `--model model_file` and `--anchors anchor_file`.
 29 | ```
 30 | ---
 31 | ### Calcualte mAP on cocoapi
 32 | ```
 33 | 1. cd tensorflow-keras-yolov3
 34 | 2. pip install cython # solution of issue:(gcc: error: pycocotools/_mask.c: No such file or directory)
 35 | 3. sudo rm -rf cocoapi && git clone https://github.com/cocodataset/cocoapi && cd cocoapi/PythonAPI && make && cd ../.. && cp -r cocoapi/PythonAPI/pycocotools ./
 36 | 4. Use `python yolo_valid.py` to test the official YOLOv3 weights.
 37 | ```
 38 | 
 39 | ---
 40 | ### Other usage
 41 | Use --help to see usage of yolo_video.py:
 42 | ```
 43 | usage: yolo_video.py [-h] [--model MODEL] [--anchors ANCHORS]
 44 |                      [--classes CLASSES] [--gpu_num GPU_NUM] [--image]
 45 |                      [--input] [--output]
 46 | 
 47 | positional arguments:
 48 |   --input        Video input path
 49 |   --output       Video output path
 50 | 
 51 | optional arguments:
 52 |   -h, --help         show this help message and exit
 53 |   --model MODEL      path to model weight file, default model_data/yolo.h5
 54 |   --anchors ANCHORS  path to anchor definitions, default
 55 |                      model_data/yolo_anchors.txt
 56 |   --classes CLASSES  path to class definitions, default
 57 |                      model_data/coco_classes.txt
 58 |   --gpu_num GPU_NUM  Number of GPU to use, default 1
 59 |   --image            Image detection mode, will ignore all positional arguments
 60 | ```
 61 | 4. MultiGPU usage: use `--gpu_num N` to use N GPUs. It is passed to the [Keras multi_gpu_model()](https://keras.io/utils/#multi_gpu_model).
 62 | ---
 63 | ### Training
 64 | 
 65 | 1. Generate your own annotation file and class names file.  
 66 |     One row for one image;  
 67 |     Row format: `image_file_path box1 box2 ... boxN`;  
 68 |     Box format: `x_min,y_min,x_max,y_max,class_id` (no space).  
 69 |     For VOC dataset, try `python voc_annotation.py`  
 70 |     Here is an example:
 71 |     ```
 72 |     path/to/img1.jpg 50,100,150,200,0 30,50,200,120,3
 73 |     path/to/img2.jpg 120,300,250,600,2
 74 |     ...
 75 |     ```
 76 | 
 77 | 2. Make sure you have run `python convert.py -w yolov3.cfg yolov3.weights model_data/yolo_weights.h5`  
 78 |     The file model_data/yolo_weights.h5 is used to load pretrained weights.
 79 | 
 80 | 3. Modify train.py and start training.  
 81 |     `python train.py`  
 82 |     Use your trained weights or checkpoint weights with command line option `--model model_file` when using yolo_video.py
 83 |     Remember to modify class path or anchor path, with `--classes class_file` and `--anchors anchor_file`.
 84 | 
 85 | If you want to use original pretrained weights for YOLOv3:  
 86 |     1. `wget https://pjreddie.com/media/files/darknet53.conv.74`  
 87 |     2. rename it as darknet53.weights  
 88 |     3. `python convert.py -w darknet53.cfg darknet53.weights model_data/darknet53_weights.h5`  
 89 |     4. use model_data/darknet53_weights.h5 in train.py
 90 | 
 91 | ---
 92 | ### Some issues to know
 93 | 
 94 | 
 95 | 
 96 | 1. Default anchors are used. If you use your own anchors, probably some changes are needed.
 97 | 
 98 | 2. The inference result is not totally the same as Darknet but the difference is small.
 99 | 
100 | 3. The speed is slower than Darknet. Replacing PIL with opencv may help a little.
101 | 
102 | 4. Always load pretrained weights and freeze layers in the first stage of training. Or try Darknet training. It's OK if there is a mismatch warning.
103 | 
104 | 
105 | # tensorflow-keras-yolov3
106 | 


--------------------------------------------------------------------------------
/coco_annotation.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | from collections import defaultdict
 3 | 
 4 | name_box_id = defaultdict(list)
 5 | id_name = dict()
 6 | f = open(
 7 |     "mscoco2017/annotations/instances_train2017.json",
 8 |     encoding='utf-8')
 9 | data = json.load(f)
10 | 
11 | annotations = data['annotations']
12 | for ant in annotations:
13 |     id = ant['image_id']
14 |     name = 'mscoco2017/train2017/%012d.jpg' % id
15 |     cat = ant['category_id']
16 | 
17 |     if cat >= 1 and cat <= 11:
18 |         cat = cat - 1
19 |     elif cat >= 13 and cat <= 25:
20 |         cat = cat - 2
21 |     elif cat >= 27 and cat <= 28:
22 |         cat = cat - 3
23 |     elif cat >= 31 and cat <= 44:
24 |         cat = cat - 5
25 |     elif cat >= 46 and cat <= 65:
26 |         cat = cat - 6
27 |     elif cat == 67:
28 |         cat = cat - 7
29 |     elif cat == 70:
30 |         cat = cat - 9
31 |     elif cat >= 72 and cat <= 82:
32 |         cat = cat - 10
33 |     elif cat >= 84 and cat <= 90:
34 |         cat = cat - 11
35 | 
36 |     name_box_id[name].append([ant['bbox'], cat])
37 | 
38 | f = open('train.txt', 'w')
39 | for key in name_box_id.keys():
40 |     f.write(key)
41 |     box_infos = name_box_id[key]
42 |     for info in box_infos:
43 |         x_min = int(info[0][0])
44 |         y_min = int(info[0][1])
45 |         x_max = x_min + int(info[0][2])
46 |         y_max = y_min + int(info[0][3])
47 | 
48 |         box_info = " %d,%d,%d,%d,%d" % (
49 |             x_min, y_min, x_max, y_max, int(info[1]))
50 |         f.write(box_info)
51 |     f.write('\n')
52 | f.close()
53 | 


--------------------------------------------------------------------------------
/convert.py:
--------------------------------------------------------------------------------
  1 | #! /usr/bin/env python
  2 | """
  3 | Created on April, 2019
  4 | @authors: Hulking
  5 | """
  6 | """
  7 | Reads Darknet config and weights and creates Keras model with TF backend.
  8 | """
  9 | 
 10 | import argparse
 11 | import configparser
 12 | import io
 13 | import os
 14 | from collections import defaultdict
 15 | 
 16 | import numpy as np
 17 | from keras import backend as K
 18 | from keras.layers import (Conv2D, Input, ZeroPadding2D, Add,
 19 |                           UpSampling2D, MaxPooling2D, Concatenate)
 20 | from keras.layers.advanced_activations import LeakyReLU
 21 | from keras.layers.normalization import BatchNormalization
 22 | from keras.models import Model
 23 | from keras.regularizers import l2
 24 | from keras.utils.vis_utils import plot_model as plot
 25 | 
 26 | 
 27 | parser = argparse.ArgumentParser(description='Darknet To Keras Converter.')
 28 | parser.add_argument('config_path', help='Path to Darknet cfg file.')
 29 | parser.add_argument('weights_path', help='Path to Darknet weights file.')
 30 | parser.add_argument('output_path', help='Path to output Keras model file.')
 31 | parser.add_argument(
 32 |     '-p',
 33 |     '--plot_model',
 34 |     help='Plot generated Keras model and save as image.',
 35 |     action='store_true')
 36 | parser.add_argument(
 37 |     '-w',
 38 |     '--weights_only',
 39 |     help='Save as Keras weights file instead of model file.',
 40 |     action='store_true')
 41 | 
 42 | def unique_config_sections(config_file):
 43 |     """Convert all config sections to have unique names.
 44 | 
 45 |     Adds unique suffixes to config sections for compability with configparser.
 46 |     """
 47 |     section_counters = defaultdict(int)
 48 |     output_stream = io.StringIO()
 49 |     with open(config_file) as fin:
 50 |         for line in fin:
 51 |             if line.startswith('['):
 52 |                 section = line.strip().strip('[]')
 53 |                 _section = section + '_' + str(section_counters[section])
 54 |                 section_counters[section] += 1
 55 |                 line = line.replace(section, _section)
 56 |             output_stream.write(line)
 57 |     output_stream.seek(0)
 58 |     return output_stream
 59 | 
 60 | # %%
 61 | def _main(args):
 62 |     config_path = os.path.expanduser(args.config_path)
 63 |     weights_path = os.path.expanduser(args.weights_path)
 64 |     assert config_path.endswith('.cfg'), '{} is not a .cfg file'.format(
 65 |         config_path)
 66 |     assert weights_path.endswith(
 67 |         '.weights'), '{} is not a .weights file'.format(weights_path)
 68 | 
 69 |     output_path = os.path.expanduser(args.output_path)
 70 |     assert output_path.endswith(
 71 |         '.h5'), 'output path {} is not a .h5 file'.format(output_path)
 72 |     output_root = os.path.splitext(output_path)[0]
 73 | 
 74 |     # Load weights and config.
 75 |     print('Loading weights.')
 76 |     weights_file = open(weights_path, 'rb')
 77 |     major, minor, revision = np.ndarray(
 78 |         shape=(3, ), dtype='int32', buffer=weights_file.read(12))
 79 |     if (major*10+minor)>=2 and major<1000 and minor<1000:
 80 |         seen = np.ndarray(shape=(1,), dtype='int64', buffer=weights_file.read(8))
 81 |     else:
 82 |         seen = np.ndarray(shape=(1,), dtype='int32', buffer=weights_file.read(4))
 83 |     print('Weights Header: ', major, minor, revision, seen)
 84 | 
 85 |     print('Parsing Darknet config.')
 86 |     unique_config_file = unique_config_sections(config_path)
 87 |     cfg_parser = configparser.ConfigParser()
 88 |     cfg_parser.read_file(unique_config_file)
 89 | 
 90 |     print('Creating Keras model.')
 91 |     input_layer = Input(shape=(None, None, 3))
 92 |     prev_layer = input_layer
 93 |     all_layers = []
 94 | 
 95 |     weight_decay = float(cfg_parser['net_0']['decay']
 96 |                          ) if 'net_0' in cfg_parser.sections() else 5e-4
 97 |     count = 0
 98 |     out_index = []
 99 |     for section in cfg_parser.sections():
100 |         print('Parsing section {}'.format(section))
101 |         if section.startswith('convolutional'):
102 |             filters = int(cfg_parser[section]['filters'])
103 |             size = int(cfg_parser[section]['size'])
104 |             stride = int(cfg_parser[section]['stride'])
105 |             pad = int(cfg_parser[section]['pad'])
106 |             activation = cfg_parser[section]['activation']
107 |             batch_normalize = 'batch_normalize' in cfg_parser[section]
108 | 
109 |             padding = 'same' if pad == 1 and stride == 1 else 'valid'
110 | 
111 |             # Setting weights.
112 |             # Darknet serializes convolutional weights as:
113 |             # [bias/beta, [gamma, mean, variance], conv_weights]
114 |             prev_layer_shape = K.int_shape(prev_layer)
115 | 
116 |             weights_shape = (size, size, prev_layer_shape[-1], filters)
117 |             darknet_w_shape = (filters, weights_shape[2], size, size)
118 |             weights_size = np.product(weights_shape)
119 | 
120 |             print('conv2d', 'bn'
121 |                   if batch_normalize else '  ', activation, weights_shape)
122 | 
123 |             conv_bias = np.ndarray(
124 |                 shape=(filters, ),
125 |                 dtype='float32',
126 |                 buffer=weights_file.read(filters * 4))
127 |             count += filters
128 | 
129 |             if batch_normalize:
130 |                 bn_weights = np.ndarray(
131 |                     shape=(3, filters),
132 |                     dtype='float32',
133 |                     buffer=weights_file.read(filters * 12))
134 |                 count += 3 * filters
135 | 
136 |                 bn_weight_list = [
137 |                     bn_weights[0],  # scale gamma
138 |                     conv_bias,  # shift beta
139 |                     bn_weights[1],  # running mean
140 |                     bn_weights[2]  # running var
141 |                 ]
142 | 
143 |             conv_weights = np.ndarray(
144 |                 shape=darknet_w_shape,
145 |                 dtype='float32',
146 |                 buffer=weights_file.read(weights_size * 4))
147 |             count += weights_size
148 | 
149 |             # DarkNet conv_weights are serialized Caffe-style:
150 |             # (out_dim, in_dim, height, width)
151 |             # We would like to set these to Tensorflow order:
152 |             # (height, width, in_dim, out_dim)
153 |             conv_weights = np.transpose(conv_weights, [2, 3, 1, 0])
154 |             conv_weights = [conv_weights] if batch_normalize else [
155 |                 conv_weights, conv_bias
156 |             ]
157 | 
158 |             # Handle activation.
159 |             act_fn = None
160 |             if activation == 'leaky':
161 |                 pass  # Add advanced activation later.
162 |             elif activation != 'linear':
163 |                 raise ValueError(
164 |                     'Unknown activation function `{}` in section {}'.format(
165 |                         activation, section))
166 | 
167 |             # Create Conv2D layer
168 |             if stride>1:
169 |                 # Darknet uses left and top padding instead of 'same' mode
170 |                 prev_layer = ZeroPadding2D(((1,0),(1,0)))(prev_layer)
171 |             conv_layer = (Conv2D(
172 |                 filters, (size, size),
173 |                 strides=(stride, stride),
174 |                 kernel_regularizer=l2(weight_decay),
175 |                 use_bias=not batch_normalize,
176 |                 weights=conv_weights,
177 |                 activation=act_fn,
178 |                 padding=padding))(prev_layer)
179 | 
180 |             if batch_normalize:
181 |                 conv_layer = (BatchNormalization(
182 |                     weights=bn_weight_list))(conv_layer)
183 |             prev_layer = conv_layer
184 | 
185 |             if activation == 'linear':
186 |                 all_layers.append(prev_layer)
187 |             elif activation == 'leaky':
188 |                 act_layer = LeakyReLU(alpha=0.1)(prev_layer)
189 |                 prev_layer = act_layer
190 |                 all_layers.append(act_layer)
191 | 
192 |         elif section.startswith('route'):
193 |             ids = [int(i) for i in cfg_parser[section]['layers'].split(',')]
194 |             layers = [all_layers[i] for i in ids]
195 |             if len(layers) > 1:
196 |                 print('Concatenating route layers:', layers)
197 |                 concatenate_layer = Concatenate()(layers)
198 |                 all_layers.append(concatenate_layer)
199 |                 prev_layer = concatenate_layer
200 |             else:
201 |                 skip_layer = layers[0]  # only one layer to route
202 |                 all_layers.append(skip_layer)
203 |                 prev_layer = skip_layer
204 | 
205 |         elif section.startswith('maxpool'):
206 |             size = int(cfg_parser[section]['size'])
207 |             stride = int(cfg_parser[section]['stride'])
208 |             all_layers.append(
209 |                 MaxPooling2D(
210 |                     pool_size=(size, size),
211 |                     strides=(stride, stride),
212 |                     padding='same')(prev_layer))
213 |             prev_layer = all_layers[-1]
214 | 
215 |         elif section.startswith('shortcut'):
216 |             index = int(cfg_parser[section]['from'])
217 |             activation = cfg_parser[section]['activation']
218 |             assert activation == 'linear', 'Only linear activation supported.'
219 |             all_layers.append(Add()([all_layers[index], prev_layer]))
220 |             prev_layer = all_layers[-1]
221 | 
222 |         elif section.startswith('upsample'):
223 |             stride = int(cfg_parser[section]['stride'])
224 |             assert stride == 2, 'Only stride=2 supported.'
225 |             all_layers.append(UpSampling2D(stride)(prev_layer))
226 |             prev_layer = all_layers[-1]
227 | 
228 |         elif section.startswith('yolo'):
229 |             out_index.append(len(all_layers)-1)
230 |             all_layers.append(None)
231 |             prev_layer = all_layers[-1]
232 | 
233 |         elif section.startswith('net'):
234 |             pass
235 | 
236 |         else:
237 |             raise ValueError(
238 |                 'Unsupported section header type: {}'.format(section))
239 | 
240 |     # Create and save model.
241 |     if len(out_index)==0: out_index.append(len(all_layers)-1)
242 |     model = Model(inputs=input_layer, outputs=[all_layers[i] for i in out_index])
243 |     print(model.summary())
244 |     if args.weights_only:
245 |         model.save_weights('{}'.format(output_path))
246 |         print('Saved Keras weights to {}'.format(output_path))
247 |     else:
248 |         model.save('{}'.format(output_path))
249 |         print('Saved Keras model to {}'.format(output_path))
250 | 
251 |     # Check to see if all weights have been read.
252 |     remaining_weights = len(weights_file.read()) / 4
253 |     weights_file.close()
254 |     print('Read {} of {} from Darknet weights.'.format(count, count +
255 |                                                        remaining_weights))
256 |     if remaining_weights > 0:
257 |         print('Warning: {} unused weights'.format(remaining_weights))
258 | 
259 |     if args.plot_model:
260 |         plot(model, to_file='{}.png'.format(output_root), show_shapes=True)
261 |         print('Saved model plot to {}.png'.format(output_root))
262 | 
263 | 
264 | if __name__ == '__main__':
265 |     _main(parser.parse_args())
266 | 


--------------------------------------------------------------------------------
/darknet53.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | batch=1
  4 | subdivisions=1
  5 | # Training
  6 | # batch=64
  7 | # subdivisions=16
  8 | width=416
  9 | height=416
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=400000,450000
 23 | scales=.1,.1
 24 | 
 25 | [convolutional]
 26 | batch_normalize=1
 27 | filters=32
 28 | size=3
 29 | stride=1
 30 | pad=1
 31 | activation=leaky
 32 | 
 33 | # Downsample
 34 | 
 35 | [convolutional]
 36 | batch_normalize=1
 37 | filters=64
 38 | size=3
 39 | stride=2
 40 | pad=1
 41 | activation=leaky
 42 | 
 43 | [convolutional]
 44 | batch_normalize=1
 45 | filters=32
 46 | size=1
 47 | stride=1
 48 | pad=1
 49 | activation=leaky
 50 | 
 51 | [convolutional]
 52 | batch_normalize=1
 53 | filters=64
 54 | size=3
 55 | stride=1
 56 | pad=1
 57 | activation=leaky
 58 | 
 59 | [shortcut]
 60 | from=-3
 61 | activation=linear
 62 | 
 63 | # Downsample
 64 | 
 65 | [convolutional]
 66 | batch_normalize=1
 67 | filters=128
 68 | size=3
 69 | stride=2
 70 | pad=1
 71 | activation=leaky
 72 | 
 73 | [convolutional]
 74 | batch_normalize=1
 75 | filters=64
 76 | size=1
 77 | stride=1
 78 | pad=1
 79 | activation=leaky
 80 | 
 81 | [convolutional]
 82 | batch_normalize=1
 83 | filters=128
 84 | size=3
 85 | stride=1
 86 | pad=1
 87 | activation=leaky
 88 | 
 89 | [shortcut]
 90 | from=-3
 91 | activation=linear
 92 | 
 93 | [convolutional]
 94 | batch_normalize=1
 95 | filters=64
 96 | size=1
 97 | stride=1
 98 | pad=1
 99 | activation=leaky
100 | 
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 | 
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 | 
113 | # Downsample
114 | 
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 | 
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 | 
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 | 
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 | 
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 | 
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 | 
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 | 
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 | 
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 | 
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 | 
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 | 
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 | 
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 | 
203 | 
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 | 
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 | 
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 | 
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 | 
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 | 
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 | 
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 | 
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 | 
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 | 
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 | 
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 | 
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 | 
284 | # Downsample
285 | 
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 | 
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 | 
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 | 
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 | 
314 | 
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 | 
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 | 
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 | 
335 | 
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 | 
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 | 
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 | 
356 | 
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 | 
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 | 
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 | 
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 | 
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 | 
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 | 
397 | 
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 | 
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 | 
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 | 
418 | 
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 | 
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 | 
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 | 
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 | 
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 | 
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 | 
459 | # Downsample
460 | 
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 | 
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 | 
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 | 
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 | 
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 | 
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 | 
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 | 
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 | 
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 | 
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 | 
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 | 
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 | 
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 | 
549 | 


--------------------------------------------------------------------------------
/draw_bbox.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # coding: utf-8
  3 | """
  4 | Created on April, 2019
  5 | @authors: Hulking
  6 | """
  7 | import os
  8 | import cv2
  9 | import glob
 10 | import numpy as np
 11 | from keras.preprocessing.image import load_img, img_to_array
 12 | from keras.applications.imagenet_utils import preprocess_input as preprocess
 13 | from pycocotools.coco import COCO
 14 | import skimage.io as io
 15 | import matplotlib.pyplot as plt
 16 | import matplotlib.patches as patches
 17 | import pylab
 18 | from PIL import Image, ImageFont, ImageDraw
 19 | 
 20 | """
 21 | 路径定义
 22 | """
 23 | #path list
 24 | anchors_path='./model_data/yolo_anchors.txt'
 25 | classes_path='./model_data/coco_classes.txt'
 26 | img_list_path='./model_data/5k.txt'
 27 | img_list_dir="/home/common/datasets/coco/"
 28 | imgs_path='/home/common/datasets/coco/images/val2014/'
 29 | gt_folder= '/home/common/datasets/coco/annotations/'
 30 | res_path='./results/cocoapi_results.json'
 31 | res_dir='./results/'
 32 | res_imgs_path='./results/pics/'
 33 | # load and display instance annotations
 34 | dataDir='/home/common/datasets/coco'
 35 | dataType='val2014'
 36 | annFile='{}/annotations/instances_{}.json'.format(dataDir,dataType)
 37 | print ("annFile",annFile)
 38 | coco=COCO(annFile)
 39 | 
 40 | 
 41 | """
 42 | 画框函数
 43 | """
 44 | def draw_rectangle(draw, coordinates, color, width=1):
 45 |     for i in range(width):
 46 |         rect_start = (coordinates[0][0] - i, coordinates[0][1] - i)
 47 |         rect_end = (coordinates[1][0] + i, coordinates[1][1] + i)
 48 |         draw.rectangle((rect_start, rect_end), outline = color)
 49 | 
 50 | 
 51 | """
 52 | 区分不同类别框的颜色
 53 | """
 54 | def id_to_color(id):
 55 |     #id=id & 63
 56 |     num=id+1
 57 |     R=(num%2)*10+(num>>1)%2
 58 |     G=((num>>1)%2)*10+(num>>1)%2
 59 |     B=((num>>1)%2)*10+(num>>1)%2
 60 |     R=(id%7)*13+R*4
 61 |     G=(id%8)*18+G*6
 62 |     B=(id%5)*17+B*9
 63 |     return R,G,B
 64 | 
 65 | 
 66 | with open(classes_path) as f:
 67 |     obj_list = f.readlines()
 68 |     obj_list = [x.strip() for x in obj_list]
 69 |     
 70 | coco_ids= [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 31, 32,
 71 |         33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,
 72 |         59, 60, 61, 62, 63, 64, 65, 67, 70, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 84, 85, 86, 87, 88,
 73 |         89, 90]
 74 | 
 75 | 
 76 | """
 77 | 图片结果绘制
 78 | """
 79 | with open(res_dir+"cocoapi_results.json") as rf:
 80 |     rf_list=rf.readlines()
 81 |     rf_list=[x.strip() for x in rf_list]
 82 |     
 83 |     with open(img_list_dir+"5k.txt") as f:
 84 |         total_img_list = f.readlines()
 85 |         # remove whitespace characters like `\n` at the end of each line
 86 |         total_img_list = [x.strip() for x in total_img_list]
 87 |         total_num_t_img = len(total_img_list)
 88 |         print("number of images in 5k list: ", total_num_t_img)
 89 |         gt_num = 0
 90 | 
 91 |         for image_path in total_img_list:
 92 |             gt_num += 1
 93 |             print(image_path)
 94 |             img=Image.open(image_path)
 95 |             draw =ImageDraw.Draw(img)
 96 |             image_name=int(image_path[50:56])
 97 |             print("image_name:",image_name)
 98 |             
 99 |             #draw GT bbox,class name,score
100 |             imgIds = coco.getImgIds(imgIds = [image_name])
101 |             annIds = coco.getAnnIds(imgIds, iscrowd=None)
102 |             anns = coco.loadAnns(annIds)
103 | #             coco.showAnns(anns)
104 | #             plt.show()
105 |             for n in range(len(anns)):
106 |                 print (n)
107 |                 x, y, w, h = anns[n]['bbox']
108 |                 x, y, w, h = int(x), int(y), int(w), int(h)
109 |                 cat=anns[n]['category_id']
110 |                 print("gt-obj:",obj_list[coco_ids.index(cat)])
111 |                 print(cat,x, y, w, h)
112 |                 draw_rectangle(draw, ((x, y), (x + w, y + h)), color=(0,255,0), width=outline_width)
113 |                 draw.text((x, y-offset_y), obj_list[coco_ids.index(cat)], font=setFont,fill=(0,255,0), width= 0.3)
114 |             
115 |             
116 |             rf_num=0
117 |             for rf_dict in rf_list:
118 |                 rf_dict=ast.literal_eval(rf_dict)
119 |                 
120 |                 rf_id=rf_dict['image_id']
121 |                 if image_name==rf_id:
122 |                     rf_num+=1
123 | #                   print("image_name==rf_id",rf_id)
124 | #                   print ("rf_index=",rf_num)
125 |                     x, y, w, h = rf_dict['bbox']
126 |                     x, y, w, h = int(x), int(y), int(w), int(h)
127 |                     print(x, y, w, h)
128 |                     obj_name=obj_list[coco_ids.index(rf_dict['category_id'])]
129 |                     print_content=obj_name+" conf:"+str(round(rf_dict['score'],3))
130 |                     #outline_width = int(x*y/2000)
131 |                     outline_width=4
132 |                     outline_color = id_to_color(rf_dict['category_id'])
133 |                     #draw_rectangle(draw, ((x, y), (x + w, y + h)), color=outline_color, width=outline_width)
134 |                     setFont= ImageFont.truetype("./font/FiraMono-Medium.otf", 25, encoding="unic")
135 |                     offset_y=30
136 |                     #draw.text((x, y-offset_y), obj_name, font=setFont,fill=id_to_color(rf_dict['category_id']), width= 0.5)
137 |                     draw_rectangle(draw, ((x, y), (x + w, y + h)), color=(255,0,0), width=outline_width)
138 |                     draw.text((x, y-offset_y), print_content, font=setFont,fill=(255,0,0), width= 0.3)
139 |             img.save(res_imgs_path+str(image_name).zfill(6)+'.jpg')
140 |             plt.imshow(img)
141 |             plt.show()
142 |         print("number of images with gt in 5k list: ", gt_num)
143 |     
144 | 
145 | 


--------------------------------------------------------------------------------
/font/FiraMono-Medium.otf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HulkMaker/tensorflow-keras-yolov3/04a873529e9941a576e7058e47f8991d188ba15b/font/FiraMono-Medium.otf


--------------------------------------------------------------------------------
/font/SIL Open Font License.txt:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2014, Mozilla Foundation https://mozilla.org/ with Reserved Font Name Fira Mono.
 2 | 
 3 | Copyright (c) 2014, Telefonica S.A.
 4 | 
 5 | This Font Software is licensed under the SIL Open Font License, Version 1.1.
 6 | This license is copied below, and is also available with a FAQ at: http://scripts.sil.org/OFL
 7 | 
 8 | -----------------------------------------------------------
 9 | SIL OPEN FONT LICENSE Version 1.1 - 26 February 2007
10 | -----------------------------------------------------------
11 | 
12 | PREAMBLE
13 | The goals of the Open Font License (OFL) are to stimulate worldwide development of collaborative font projects, to support the font creation efforts of academic and linguistic communities, and to provide a free and open framework in which fonts may be shared and improved in partnership with others.
14 | 
15 | The OFL allows the licensed fonts to be used, studied, modified and redistributed freely as long as they are not sold by themselves. The fonts, including any derivative works, can be bundled, embedded, redistributed and/or sold with any software provided that any reserved names are not used by derivative works. The fonts and derivatives, however, cannot be released under any other type of license. The requirement for fonts to remain under this license does not apply to any document created using the fonts or their derivatives.
16 | 
17 | DEFINITIONS
18 | "Font Software" refers to the set of files released by the Copyright Holder(s) under this license and clearly marked as such. This may include source files, build scripts and documentation.
19 | 
20 | "Reserved Font Name" refers to any names specified as such after the copyright statement(s).
21 | 
22 | "Original Version" refers to the collection of Font Software components as distributed by the Copyright Holder(s).
23 | 
24 | "Modified Version" refers to any derivative made by adding to, deleting, or substituting -- in part or in whole -- any of the components of the Original Version, by changing formats or by porting the Font Software to a new environment.
25 | 
26 | "Author" refers to any designer, engineer, programmer, technical writer or other person who contributed to the Font Software.
27 | 
28 | PERMISSION & CONDITIONS
29 | Permission is hereby granted, free of charge, to any person obtaining a copy of the Font Software, to use, study, copy, merge, embed, modify, redistribute, and sell modified and unmodified copies of the Font Software, subject to the following conditions:
30 | 
31 | 1) Neither the Font Software nor any of its individual components, in Original or Modified Versions, may be sold by itself.
32 | 
33 | 2) Original or Modified Versions of the Font Software may be bundled, redistributed and/or sold with any software, provided that each copy contains the above copyright notice and this license. These can be included either as stand-alone text files, human-readable headers or in the appropriate machine-readable metadata fields within text or binary files as long as those fields can be easily viewed by the user.
34 | 
35 | 3) No Modified Version of the Font Software may use the Reserved Font Name(s) unless explicit written permission is granted by the corresponding Copyright Holder. This restriction only applies to the primary font name as presented to the users.
36 | 
37 | 4) The name(s) of the Copyright Holder(s) or the Author(s) of the Font Software shall not be used to promote, endorse or advertise any Modified Version, except to acknowledge the contribution(s) of the Copyright Holder(s) and the Author(s) or with their explicit written permission.
38 | 
39 | 5) The Font Software, modified or unmodified, in part or in whole, must be distributed entirely under this license, and must not be distributed under any other license. The requirement for fonts to remain under this license does not apply to any document created using the Font Software.
40 | 
41 | TERMINATION
42 | This license becomes null and void if any of the above conditions are not met.
43 | 
44 | DISCLAIMER
45 | THE FONT SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF COPYRIGHT, PATENT, TRADEMARK, OR OTHER RIGHT. IN NO EVENT SHALL THE COPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, INCLUDING ANY GENERAL, SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF THE USE OR INABILITY TO USE THE FONT SOFTWARE OR FROM OTHER DEALINGS IN THE FONT SOFTWARE.


--------------------------------------------------------------------------------
/kmeans.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Created on April, 2019
  3 | @authors: Hulking
  4 | """
  5 | import numpy as np
  6 | """
  7 | 使用K-means算法计算锚点的最优选择
  8 | """
  9 | class YOLO_Kmeans:
 10 | 
 11 |     def __init__(self, cluster_number, filename):
 12 |         self.cluster_number = cluster_number
 13 |         self.filename = "2012_train.txt"
 14 | 
 15 |     def iou(self, boxes, clusters):  # 1 box -> k clusters
 16 |         n = boxes.shape[0]
 17 |         k = self.cluster_number
 18 | 
 19 |         box_area = boxes[:, 0] * boxes[:, 1]
 20 |         box_area = box_area.repeat(k)
 21 |         box_area = np.reshape(box_area, (n, k))
 22 | 
 23 |         cluster_area = clusters[:, 0] * clusters[:, 1]
 24 |         cluster_area = np.tile(cluster_area, [1, n])
 25 |         cluster_area = np.reshape(cluster_area, (n, k))
 26 | 
 27 |         box_w_matrix = np.reshape(boxes[:, 0].repeat(k), (n, k))
 28 |         cluster_w_matrix = np.reshape(np.tile(clusters[:, 0], (1, n)), (n, k))
 29 |         min_w_matrix = np.minimum(cluster_w_matrix, box_w_matrix)
 30 | 
 31 |         box_h_matrix = np.reshape(boxes[:, 1].repeat(k), (n, k))
 32 |         cluster_h_matrix = np.reshape(np.tile(clusters[:, 1], (1, n)), (n, k))
 33 |         min_h_matrix = np.minimum(cluster_h_matrix, box_h_matrix)
 34 |         inter_area = np.multiply(min_w_matrix, min_h_matrix)
 35 | 
 36 |         result = inter_area / (box_area + cluster_area - inter_area)
 37 |         return result
 38 | 
 39 |     def avg_iou(self, boxes, clusters):
 40 |         accuracy = np.mean([np.max(self.iou(boxes, clusters), axis=1)])
 41 |         return accuracy
 42 | 
 43 |     def kmeans(self, boxes, k, dist=np.median):
 44 |         box_number = boxes.shape[0]
 45 |         distances = np.empty((box_number, k))
 46 |         last_nearest = np.zeros((box_number,))
 47 |         np.random.seed()
 48 |         clusters = boxes[np.random.choice(
 49 |             box_number, k, replace=False)]  # init k clusters
 50 |         while True:
 51 | 
 52 |             distances = 1 - self.iou(boxes, clusters)
 53 | 
 54 |             current_nearest = np.argmin(distances, axis=1)
 55 |             if (last_nearest == current_nearest).all():
 56 |                 break  # clusters won't change
 57 |             for cluster in range(k):
 58 |                 clusters[cluster] = dist(  # update clusters
 59 |                     boxes[current_nearest == cluster], axis=0)
 60 | 
 61 |             last_nearest = current_nearest
 62 | 
 63 |         return clusters
 64 | 
 65 |     def result2txt(self, data):
 66 |         f = open("yolo_anchors.txt", 'w')
 67 |         row = np.shape(data)[0]
 68 |         for i in range(row):
 69 |             if i == 0:
 70 |                 x_y = "%d,%d" % (data[i][0], data[i][1])
 71 |             else:
 72 |                 x_y = ", %d,%d" % (data[i][0], data[i][1])
 73 |             f.write(x_y)
 74 |         f.close()
 75 | 
 76 |     def txt2boxes(self):
 77 |         f = open(self.filename, 'r')
 78 |         dataSet = []
 79 |         for line in f:
 80 |             infos = line.split(" ")
 81 |             length = len(infos)
 82 |             for i in range(1, length):
 83 |                 width = int(infos[i].split(",")[2]) - \
 84 |                     int(infos[i].split(",")[0])
 85 |                 height = int(infos[i].split(",")[3]) - \
 86 |                     int(infos[i].split(",")[1])
 87 |                 dataSet.append([width, height])
 88 |         result = np.array(dataSet)
 89 |         f.close()
 90 |         return result
 91 | 
 92 |     def txt2clusters(self):
 93 |         all_boxes = self.txt2boxes()
 94 |         result = self.kmeans(all_boxes, k=self.cluster_number)
 95 |         result = result[np.lexsort(result.T[0, None])]
 96 |         self.result2txt(result)
 97 |         print("K anchors:\n {}".format(result))
 98 |         print("Accuracy: {:.2f}%".format(
 99 |             self.avg_iou(all_boxes, result) * 100))
100 | 
101 | 
102 | if __name__ == "__main__":
103 |     cluster_number = 9
104 |     filename = "2012_train.txt"
105 |     kmeans = YOLO_Kmeans(cluster_number, filename)
106 |     kmeans.txt2clusters()
107 | 


--------------------------------------------------------------------------------
/model_data/coco_classes.txt:
--------------------------------------------------------------------------------
 1 | person
 2 | bicycle
 3 | car
 4 | motorbike
 5 | aeroplane
 6 | bus
 7 | train
 8 | truck
 9 | boat
10 | traffic light
11 | fire hydrant
12 | stop sign
13 | parking meter
14 | bench
15 | bird
16 | cat
17 | dog
18 | horse
19 | sheep
20 | cow
21 | elephant
22 | bear
23 | zebra
24 | giraffe
25 | backpack
26 | umbrella
27 | handbag
28 | tie
29 | suitcase
30 | frisbee
31 | skis
32 | snowboard
33 | sports ball
34 | kite
35 | baseball bat
36 | baseball glove
37 | skateboard
38 | surfboard
39 | tennis racket
40 | bottle
41 | wine glass
42 | cup
43 | fork
44 | knife
45 | spoon
46 | bowl
47 | banana
48 | apple
49 | sandwich
50 | orange
51 | broccoli
52 | carrot
53 | hot dog
54 | pizza
55 | donut
56 | cake
57 | chair
58 | sofa
59 | pottedplant
60 | bed
61 | diningtable
62 | toilet
63 | tvmonitor
64 | laptop
65 | mouse
66 | remote
67 | keyboard
68 | cell phone
69 | microwave
70 | oven
71 | toaster
72 | sink
73 | refrigerator
74 | book
75 | clock
76 | vase
77 | scissors
78 | teddy bear
79 | hair drier
80 | toothbrush
81 | 


--------------------------------------------------------------------------------
/model_data/tiny_yolo_anchors.txt:
--------------------------------------------------------------------------------
1 | 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
2 | 


--------------------------------------------------------------------------------
/model_data/voc_classes.txt:
--------------------------------------------------------------------------------
 1 | aeroplane
 2 | bicycle
 3 | bird
 4 | boat
 5 | bottle
 6 | bus
 7 | car
 8 | cat
 9 | chair
10 | cow
11 | diningtable
12 | dog
13 | horse
14 | motorbike
15 | person
16 | pottedplant
17 | sheep
18 | sofa
19 | train
20 | tvmonitor
21 | 


--------------------------------------------------------------------------------
/model_data/yolo_anchors.txt:
--------------------------------------------------------------------------------
1 | 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
2 | 


--------------------------------------------------------------------------------
/pycocoEval.py:
--------------------------------------------------------------------------------
 1 | #-*- coding:utf-8 -*-
 2 | """
 3 | Created on April, 2019
 4 | @authors: Hulking
 5 | """
 6 | """
 7 | 计算mAP
 8 | """
 9 | import matplotlib.pyplot as plt
10 | from pycocotools.coco import COCO
11 | from pycocotools.cocoeval import COCOeval
12 | import numpy as np
13 | import skimage.io as io
14 | import pylab,json
15 | pylab.rcParams['figure.figsize'] = (10.0, 8.0)
16 | def get_img_id(file_name):
17 |     ls = []
18 |     myset = []
19 |     annos = json.load(open(file_name, 'r'))
20 |     for anno in annos:
21 |       ls.append(anno['image_id'])
22 |     myset = {}.fromkeys(ls).keys()
23 |     return myset
24 | def cal_coco_map():
25 |     annType = ['segm', 'bbox', 'keypoints']
26 |     annType = annType[1]
27 |     cocoGt_file = '/home/common/datasets/coco/annotations/instances_val2014.json'
28 |     cocoGt = COCO(cocoGt_file)
29 |     cocoDt_file = './results/cocoapi_results.json'
30 |     imgIds = get_img_id(cocoDt_file)
31 |     print (len(imgIds))
32 |     cocoDt = cocoGt.loadRes(cocoDt_file)
33 |     imgIds = sorted(imgIds)
34 |     imgIds = imgIds[0:5000]
35 |     cocoEval = COCOeval(cocoGt, cocoDt, annType)
36 |     cocoEval.params.imgIds = imgIds
37 |     cocoEval.evaluate()
38 |     cocoEval.accumulate()
39 |     cocoEval.summarize()
40 | 
41 | if __name__ == '__main__':
42 |     cal_coco_map()


--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
  1 | """
  2 | @authors: Hulking
  3 | April 2019
  4 | """
  5 | 
  6 | import numpy as np
  7 | import keras.backend as K
  8 | from keras.layers import Input, Lambda
  9 | from keras.models import Model
 10 | from keras.optimizers import Adam
 11 | from keras.callbacks import TensorBoard, ModelCheckpoint, ReduceLROnPlateau, EarlyStopping
 12 | 
 13 | from yolo3.model import preprocess_true_boxes, yolo_body, tiny_yolo_body, yolo_loss
 14 | from yolo3.utils import get_random_data
 15 | 
 16 | """
 17 | 训练流程
 18 | """
 19 | def _main():
 20 |     annotation_path = 'train.txt'
 21 |     log_dir = 'logs/000/'
 22 |     classes_path = 'model_data/voc_classes.txt'
 23 |     anchors_path = 'model_data/yolo_anchors.txt'
 24 |     class_names = get_classes(classes_path)
 25 |     num_classes = len(class_names)
 26 |     anchors = get_anchors(anchors_path)
 27 | 
 28 |     input_shape = (416,416) # multiple of 32, hw
 29 | 
 30 |     is_tiny_version = len(anchors)==6 # default setting
 31 |     if is_tiny_version:
 32 |         model = create_tiny_model(input_shape, anchors, num_classes,
 33 |             freeze_body=2, weights_path='model_data/tiny_yolo_weights.h5')
 34 |     else:
 35 |         model = create_model(input_shape, anchors, num_classes,
 36 |             freeze_body=2, weights_path='model_data/yolo_weights.h5') # make sure you know what you freeze
 37 | 
 38 |     logging = TensorBoard(log_dir=log_dir)
 39 |     checkpoint = ModelCheckpoint(log_dir + 'ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5',
 40 |         monitor='val_loss', save_weights_only=True, save_best_only=True, period=3)
 41 |     reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, verbose=1)
 42 |     early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1)
 43 | 
 44 |     val_split = 0.1
 45 |     with open(annotation_path) as f:
 46 |         lines = f.readlines()
 47 |     np.random.seed(10101)
 48 |     np.random.shuffle(lines)
 49 |     np.random.seed(None)
 50 |     num_val = int(len(lines)*val_split)
 51 |     num_train = len(lines) - num_val
 52 | 
 53 |     # Train with frozen layers first, to get a stable loss.
 54 |     # Adjust num epochs to your dataset. This step is enough to obtain a not bad model.
 55 |     if True:
 56 |         model.compile(optimizer=Adam(lr=1e-3), loss={
 57 |             # use custom yolo_loss Lambda layer.
 58 |             'yolo_loss': lambda y_true, y_pred: y_pred})
 59 | 
 60 |         batch_size = 32
 61 |         print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size))
 62 |         model.fit_generator(data_generator_wrapper(lines[:num_train], batch_size, input_shape, anchors, num_classes),
 63 |                 steps_per_epoch=max(1, num_train//batch_size),
 64 |                 validation_data=data_generator_wrapper(lines[num_train:], batch_size, input_shape, anchors, num_classes),
 65 |                 validation_steps=max(1, num_val//batch_size),
 66 |                 epochs=50,
 67 |                 initial_epoch=0,
 68 |                 callbacks=[logging, checkpoint])
 69 |         model.save_weights(log_dir + 'trained_weights_stage_1.h5')
 70 | 
 71 |     # Unfreeze and continue training, to fine-tune.
 72 |     # Train longer if the result is not good.
 73 |     if True:
 74 |         for i in range(len(model.layers)):
 75 |             model.layers[i].trainable = True
 76 |         model.compile(optimizer=Adam(lr=1e-4), loss={'yolo_loss': lambda y_true, y_pred: y_pred}) # recompile to apply the change
 77 |         print('Unfreeze all of the layers.')
 78 | 
 79 |         batch_size = 32 # note that more GPU memory is required after unfreezing the body
 80 |         print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size))
 81 |         model.fit_generator(data_generator_wrapper(lines[:num_train], batch_size, input_shape, anchors, num_classes),
 82 |             steps_per_epoch=max(1, num_train//batch_size),
 83 |             validation_data=data_generator_wrapper(lines[num_train:], batch_size, input_shape, anchors, num_classes),
 84 |             validation_steps=max(1, num_val//batch_size),
 85 |             epochs=100,
 86 |             initial_epoch=50,
 87 |             callbacks=[logging, checkpoint, reduce_lr, early_stopping])
 88 |         model.save_weights(log_dir + 'trained_weights_final.h5')
 89 | 
 90 |     # Further training if needed.
 91 | 
 92 | """
 93 | 获取类别
 94 | """
 95 | def get_classes(classes_path):
 96 |     '''loads the classes'''
 97 |     with open(classes_path) as f:
 98 |         class_names = f.readlines()
 99 |     class_names = [c.strip() for c in class_names]
100 |     return class_names
101 | """
102 | 获取锚点
103 | """
104 | def get_anchors(anchors_path):
105 |     '''loads the anchors from a file'''
106 |     with open(anchors_path) as f:
107 |         anchors = f.readline()
108 |     anchors = [float(x) for x in anchors.split(',')]
109 |     return np.array(anchors).reshape(-1, 2)
110 | 
111 | """
112 | 创建模型
113 | """
114 | def create_model(input_shape, anchors, num_classes, load_pretrained=True, freeze_body=2,
115 |             weights_path='model_data/yolo_weights.h5'):
116 |     '''create the training model'''
117 |     K.clear_session() # get a new session
118 |     image_input = Input(shape=(None, None, 3))
119 |     h, w = input_shape
120 |     num_anchors = len(anchors)
121 | 
122 |     y_true = [Input(shape=(h//{0:32, 1:16, 2:8}[l], w//{0:32, 1:16, 2:8}[l], \
123 |         num_anchors//3, num_classes+5)) for l in range(3)]
124 | 
125 |     model_body = yolo_body(image_input, num_anchors//3, num_classes)
126 |     print('Create YOLOv3 model with {} anchors and {} classes.'.format(num_anchors, num_classes))
127 | 
128 |     if load_pretrained:
129 |         model_body.load_weights(weights_path, by_name=True, skip_mismatch=True)
130 |         print('Load weights {}.'.format(weights_path))
131 |         if freeze_body in [1, 2]:
132 |             # Freeze darknet53 body or freeze all but 3 output layers.
133 |             num = (185, len(model_body.layers)-3)[freeze_body-1]
134 |             for i in range(num): model_body.layers[i].trainable = False
135 |             print('Freeze the first {} layers of total {} layers.'.format(num, len(model_body.layers)))
136 | 
137 |     model_loss = Lambda(yolo_loss, output_shape=(1,), name='yolo_loss',
138 |         arguments={'anchors': anchors, 'num_classes': num_classes, 'ignore_thresh': 0.5})(
139 |         [*model_body.output, *y_true])
140 |     model = Model([model_body.input, *y_true], model_loss)
141 | 
142 |     return model
143 | """
144 | 创建阉割版模型
145 | """
146 | def create_tiny_model(input_shape, anchors, num_classes, load_pretrained=True, freeze_body=2,
147 |             weights_path='model_data/tiny_yolo_weights.h5'):
148 |     '''create the training model, for Tiny YOLOv3'''
149 |     K.clear_session() # get a new session
150 |     image_input = Input(shape=(None, None, 3))
151 |     h, w = input_shape
152 |     num_anchors = len(anchors)
153 | 
154 |     y_true = [Input(shape=(h//{0:32, 1:16}[l], w//{0:32, 1:16}[l], \
155 |         num_anchors//2, num_classes+5)) for l in range(2)]
156 | 
157 |     model_body = tiny_yolo_body(image_input, num_anchors//2, num_classes)
158 |     print('Create Tiny YOLOv3 model with {} anchors and {} classes.'.format(num_anchors, num_classes))
159 | 
160 |     if load_pretrained:
161 |         model_body.load_weights(weights_path, by_name=True, skip_mismatch=True)
162 |         print('Load weights {}.'.format(weights_path))
163 |         if freeze_body in [1, 2]:
164 |             # Freeze the darknet body or freeze all but 2 output layers.
165 |             num = (20, len(model_body.layers)-2)[freeze_body-1]
166 |             for i in range(num): model_body.layers[i].trainable = False
167 |             print('Freeze the first {} layers of total {} layers.'.format(num, len(model_body.layers)))
168 | 
169 |     model_loss = Lambda(yolo_loss, output_shape=(1,), name='yolo_loss',
170 |         arguments={'anchors': anchors, 'num_classes': num_classes, 'ignore_thresh': 0.7})(
171 |         [*model_body.output, *y_true])
172 |     model = Model([model_body.input, *y_true], model_loss)
173 | 
174 |     return model
175 | """
176 | 真值数据生成
177 | """
178 | def data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes):
179 |     '''data generator for fit_generator'''
180 |     n = len(annotation_lines)
181 |     i = 0
182 |     while True:
183 |         image_data = []
184 |         box_data = []
185 |         for b in range(batch_size):
186 |             if i==0:
187 |                 np.random.shuffle(annotation_lines)
188 |             image, box = get_random_data(annotation_lines[i], input_shape, random=True)
189 |             image_data.append(image)
190 |             box_data.append(box)
191 |             i = (i+1) % n
192 |         image_data = np.array(image_data)
193 |         box_data = np.array(box_data)
194 |         y_true = preprocess_true_boxes(box_data, input_shape, anchors, num_classes)
195 |         yield [image_data, *y_true], np.zeros(batch_size)
196 | 
197 | def data_generator_wrapper(annotation_lines, batch_size, input_shape, anchors, num_classes):
198 |     n = len(annotation_lines)
199 |     if n==0 or batch_size<=0: return None
200 |     return data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes)
201 | 
202 | if __name__ == '__main__':
203 |     _main()
204 | 


--------------------------------------------------------------------------------
/train_bottleneck.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Created on April, 2019
  3 | @authors: Hulking
  4 | """
  5 | """
  6 | Retrain the YOLO model for your own dataset.
  7 | """
  8 | import os
  9 | import numpy as np
 10 | import keras.backend as K
 11 | from keras.layers import Input, Lambda
 12 | from keras.models import Model
 13 | from keras.optimizers import Adam
 14 | from keras.callbacks import TensorBoard, ModelCheckpoint, ReduceLROnPlateau, EarlyStopping
 15 | 
 16 | from yolo3.model import preprocess_true_boxes, yolo_body, tiny_yolo_body, yolo_loss
 17 | from yolo3.utils import get_random_data
 18 | 
 19 | 
 20 | def _main():
 21 |     annotation_path = 'train.txt'
 22 |     log_dir = 'logs/000/'
 23 |     classes_path = 'model_data/coco_classes.txt'
 24 |     anchors_path = 'model_data/yolo_anchors.txt'
 25 |     class_names = get_classes(classes_path)
 26 |     num_classes = len(class_names)
 27 |     anchors = get_anchors(anchors_path)
 28 | 
 29 |     input_shape = (416,416) # multiple of 32, hw
 30 | 
 31 |     model, bottleneck_model, last_layer_model = create_model(input_shape, anchors, num_classes,
 32 |             freeze_body=2, weights_path='model_data/yolo_weights.h5') # make sure you know what you freeze
 33 | 
 34 |     logging = TensorBoard(log_dir=log_dir)
 35 |     checkpoint = ModelCheckpoint(log_dir + 'ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5',
 36 |         monitor='val_loss', save_weights_only=True, save_best_only=True, period=3)
 37 |     reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, verbose=1)
 38 |     early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1)
 39 | 
 40 |     val_split = 0.1
 41 |     with open(annotation_path) as f:
 42 |         lines = f.readlines()
 43 |     np.random.seed(10101)
 44 |     np.random.shuffle(lines)
 45 |     np.random.seed(None)
 46 |     num_val = int(len(lines)*val_split)
 47 |     num_train = len(lines) - num_val
 48 | 
 49 |     # Train with frozen layers first, to get a stable loss.
 50 |     # Adjust num epochs to your dataset. This step is enough to obtain a not bad model.
 51 |     if True:
 52 |         # perform bottleneck training
 53 |         if not os.path.isfile("bottlenecks.npz"):
 54 |             print("calculating bottlenecks")
 55 |             batch_size=8
 56 |             bottlenecks=bottleneck_model.predict_generator(data_generator_wrapper(lines, batch_size, input_shape, anchors, num_classes, random=False, verbose=True),
 57 |              steps=(len(lines)//batch_size)+1, max_queue_size=1)
 58 |             np.savez("bottlenecks.npz", bot0=bottlenecks[0], bot1=bottlenecks[1], bot2=bottlenecks[2])
 59 |     
 60 |         # load bottleneck features from file
 61 |         dict_bot=np.load("bottlenecks.npz")
 62 |         bottlenecks_train=[dict_bot["bot0"][:num_train], dict_bot["bot1"][:num_train], dict_bot["bot2"][:num_train]]
 63 |         bottlenecks_val=[dict_bot["bot0"][num_train:], dict_bot["bot1"][num_train:], dict_bot["bot2"][num_train:]]
 64 | 
 65 |         # train last layers with fixed bottleneck features
 66 |         batch_size=8
 67 |         print("Training last layers with bottleneck features")
 68 |         print('with {} samples, val on {} samples and batch size {}.'.format(num_train, num_val, batch_size))
 69 |         last_layer_model.compile(optimizer='adam', loss={'yolo_loss': lambda y_true, y_pred: y_pred})
 70 |         last_layer_model.fit_generator(bottleneck_generator(lines[:num_train], batch_size, input_shape, anchors, num_classes, bottlenecks_train),
 71 |                 steps_per_epoch=max(1, num_train//batch_size),
 72 |                 validation_data=bottleneck_generator(lines[num_train:], batch_size, input_shape, anchors, num_classes, bottlenecks_val),
 73 |                 validation_steps=max(1, num_val//batch_size),
 74 |                 epochs=30,
 75 |                 initial_epoch=0, max_queue_size=1)
 76 |         model.save_weights(log_dir + 'trained_weights_stage_0.h5')
 77 |         
 78 |         # train last layers with random augmented data
 79 |         model.compile(optimizer=Adam(lr=1e-3), loss={
 80 |             # use custom yolo_loss Lambda layer.
 81 |             'yolo_loss': lambda y_true, y_pred: y_pred})
 82 |         batch_size = 16
 83 |         print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size))
 84 |         model.fit_generator(data_generator_wrapper(lines[:num_train], batch_size, input_shape, anchors, num_classes),
 85 |                 steps_per_epoch=max(1, num_train//batch_size),
 86 |                 validation_data=data_generator_wrapper(lines[num_train:], batch_size, input_shape, anchors, num_classes),
 87 |                 validation_steps=max(1, num_val//batch_size),
 88 |                 epochs=50,
 89 |                 initial_epoch=0,
 90 |                 callbacks=[logging, checkpoint])
 91 |         model.save_weights(log_dir + 'trained_weights_stage_1.h5')
 92 | 
 93 |     # Unfreeze and continue training, to fine-tune.
 94 |     # Train longer if the result is not good.
 95 |     if True:
 96 |         for i in range(len(model.layers)):
 97 |             model.layers[i].trainable = True
 98 |         model.compile(optimizer=Adam(lr=1e-4), loss={'yolo_loss': lambda y_true, y_pred: y_pred}) # recompile to apply the change
 99 |         print('Unfreeze all of the layers.')
100 | 
101 |         batch_size = 4 # note that more GPU memory is required after unfreezing the body
102 |         print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size))
103 |         model.fit_generator(data_generator_wrapper(lines[:num_train], batch_size, input_shape, anchors, num_classes),
104 |             steps_per_epoch=max(1, num_train//batch_size),
105 |             validation_data=data_generator_wrapper(lines[num_train:], batch_size, input_shape, anchors, num_classes),
106 |             validation_steps=max(1, num_val//batch_size),
107 |             epochs=100,
108 |             initial_epoch=50,
109 |             callbacks=[logging, checkpoint, reduce_lr, early_stopping])
110 |         model.save_weights(log_dir + 'trained_weights_final.h5')
111 | 
112 |     # Further training if needed.
113 | 
114 | 
115 | def get_classes(classes_path):
116 |     '''loads the classes'''
117 |     with open(classes_path) as f:
118 |         class_names = f.readlines()
119 |     class_names = [c.strip() for c in class_names]
120 |     return class_names
121 | 
122 | def get_anchors(anchors_path):
123 |     '''loads the anchors from a file'''
124 |     with open(anchors_path) as f:
125 |         anchors = f.readline()
126 |     anchors = [float(x) for x in anchors.split(',')]
127 |     return np.array(anchors).reshape(-1, 2)
128 | 
129 | 
130 | def create_model(input_shape, anchors, num_classes, load_pretrained=True, freeze_body=2,
131 |             weights_path='model_data/yolo_weights.h5'):
132 |     '''create the training model'''
133 |     K.clear_session() # get a new session
134 |     image_input = Input(shape=(None, None, 3))
135 |     h, w = input_shape
136 |     num_anchors = len(anchors)
137 | 
138 |     y_true = [Input(shape=(h//{0:32, 1:16, 2:8}[l], w//{0:32, 1:16, 2:8}[l], \
139 |         num_anchors//3, num_classes+5)) for l in range(3)]
140 | 
141 |     model_body = yolo_body(image_input, num_anchors//3, num_classes)
142 |     print('Create YOLOv3 model with {} anchors and {} classes.'.format(num_anchors, num_classes))
143 | 
144 |     if load_pretrained:
145 |         model_body.load_weights(weights_path, by_name=True, skip_mismatch=True)
146 |         print('Load weights {}.'.format(weights_path))
147 |         if freeze_body in [1, 2]:
148 |             # Freeze darknet53 body or freeze all but 3 output layers.
149 |             num = (185, len(model_body.layers)-3)[freeze_body-1]
150 |             for i in range(num): model_body.layers[i].trainable = False
151 |             print('Freeze the first {} layers of total {} layers.'.format(num, len(model_body.layers)))
152 | 
153 |     # get output of second last layers and create bottleneck model of it
154 |     out1=model_body.layers[246].output
155 |     out2=model_body.layers[247].output
156 |     out3=model_body.layers[248].output
157 |     bottleneck_model = Model([model_body.input, *y_true], [out1, out2, out3])
158 | 
159 |     # create last layer model of last layers from yolo model
160 |     in0 = Input(shape=bottleneck_model.output[0].shape[1:].as_list()) 
161 |     in1 = Input(shape=bottleneck_model.output[1].shape[1:].as_list())
162 |     in2 = Input(shape=bottleneck_model.output[2].shape[1:].as_list())
163 |     last_out0=model_body.layers[249](in0)
164 |     last_out1=model_body.layers[250](in1)
165 |     last_out2=model_body.layers[251](in2)
166 |     model_last=Model(inputs=[in0, in1, in2], outputs=[last_out0, last_out1, last_out2])
167 |     model_loss_last =Lambda(yolo_loss, output_shape=(1,), name='yolo_loss',
168 |         arguments={'anchors': anchors, 'num_classes': num_classes, 'ignore_thresh': 0.5})(
169 |         [*model_last.output, *y_true])
170 |     last_layer_model = Model([in0,in1,in2, *y_true], model_loss_last)
171 | 
172 |     
173 |     model_loss = Lambda(yolo_loss, output_shape=(1,), name='yolo_loss',
174 |         arguments={'anchors': anchors, 'num_classes': num_classes, 'ignore_thresh': 0.5})(
175 |         [*model_body.output, *y_true])
176 |     model = Model([model_body.input, *y_true], model_loss)
177 | 
178 |     return model, bottleneck_model, last_layer_model
179 | 
180 | def data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes, random=True, verbose=False):
181 |     '''data generator for fit_generator'''
182 |     n = len(annotation_lines)
183 |     i = 0
184 |     while True:
185 |         image_data = []
186 |         box_data = []
187 |         for b in range(batch_size):
188 |             if i==0 and random:
189 |                 np.random.shuffle(annotation_lines)
190 |             image, box = get_random_data(annotation_lines[i], input_shape, random=random)
191 |             image_data.append(image)
192 |             box_data.append(box)
193 |             i = (i+1) % n
194 |         image_data = np.array(image_data)
195 |         if verbose:
196 |             print("Progress: ",i,"/",n)
197 |         box_data = np.array(box_data)
198 |         y_true = preprocess_true_boxes(box_data, input_shape, anchors, num_classes)
199 |         yield [image_data, *y_true], np.zeros(batch_size)
200 | 
201 | def data_generator_wrapper(annotation_lines, batch_size, input_shape, anchors, num_classes, random=True, verbose=False):
202 |     n = len(annotation_lines)
203 |     if n==0 or batch_size<=0: return None
204 |     return data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes, random, verbose)
205 | 
206 | def bottleneck_generator(annotation_lines, batch_size, input_shape, anchors, num_classes, bottlenecks):
207 |     n = len(annotation_lines)
208 |     i = 0
209 |     while True:
210 |         box_data = []
211 |         b0=np.zeros((batch_size,bottlenecks[0].shape[1],bottlenecks[0].shape[2],bottlenecks[0].shape[3]))
212 |         b1=np.zeros((batch_size,bottlenecks[1].shape[1],bottlenecks[1].shape[2],bottlenecks[1].shape[3]))
213 |         b2=np.zeros((batch_size,bottlenecks[2].shape[1],bottlenecks[2].shape[2],bottlenecks[2].shape[3]))
214 |         for b in range(batch_size):
215 |             _, box = get_random_data(annotation_lines[i], input_shape, random=False, proc_img=False)
216 |             box_data.append(box)
217 |             b0[b]=bottlenecks[0][i]
218 |             b1[b]=bottlenecks[1][i]
219 |             b2[b]=bottlenecks[2][i]
220 |             i = (i+1) % n
221 |         box_data = np.array(box_data)
222 |         y_true = preprocess_true_boxes(box_data, input_shape, anchors, num_classes)
223 |         yield [b0, b1, b2, *y_true], np.zeros(batch_size)
224 | 
225 | if __name__ == '__main__':
226 |     _main()
227 | 


--------------------------------------------------------------------------------
/voc_annotation.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Created on April, 2019
 3 | @authors: Hulking
 4 | """
 5 | import xml.etree.ElementTree as ET
 6 | from os import getcwd
 7 | 
 8 | sets=[('2007', 'train'), ('2007', 'val'), ('2007', 'test')]
 9 | 
10 | classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
11 | 
12 | 
13 | def convert_annotation(year, image_id, list_file):
14 |     in_file = open('VOCdevkit/VOC%s/Annotations/%s.xml'%(year, image_id))
15 |     tree=ET.parse(in_file)
16 |     root = tree.getroot()
17 | 
18 |     for obj in root.iter('object'):
19 |         difficult = obj.find('difficult').text
20 |         cls = obj.find('name').text
21 |         if cls not in classes or int(difficult)==1:
22 |             continue
23 |         cls_id = classes.index(cls)
24 |         xmlbox = obj.find('bndbox')
25 |         b = (int(xmlbox.find('xmin').text), int(xmlbox.find('ymin').text), int(xmlbox.find('xmax').text), int(xmlbox.find('ymax').text))
26 |         list_file.write(" " + ",".join([str(a) for a in b]) + ',' + str(cls_id))
27 | 
28 | wd = getcwd()
29 | 
30 | for year, image_set in sets:
31 |     image_ids = open('VOCdevkit/VOC%s/ImageSets/Main/%s.txt'%(year, image_set)).read().strip().split()
32 |     list_file = open('%s_%s.txt'%(year, image_set), 'w')
33 |     for image_id in image_ids:
34 |         list_file.write('%s/VOCdevkit/VOC%s/JPEGImages/%s.jpg'%(wd, year, image_id))
35 |         convert_annotation(year, image_id, list_file)
36 |         list_file.write('\n')
37 |     list_file.close()
38 | 
39 | 


--------------------------------------------------------------------------------
/yolo.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Created on April, 2019
  4 | @authors: Hulking
  5 | """
  6 | import colorsys
  7 | import os
  8 | from timeit import default_timer as timer
  9 | 
 10 | import numpy as np
 11 | from keras import backend as K
 12 | from keras.models import load_model
 13 | from keras.layers import Input
 14 | from PIL import Image, ImageFont, ImageDraw
 15 | 
 16 | from yolo3.model import yolo_eval, yolo_body, tiny_yolo_body
 17 | from yolo3.utils import letterbox_image
 18 | import os
 19 | from keras.utils import multi_gpu_model
 20 | 
 21 | """
 22 | YOLO类 包含了针对模型的基本操作
 23 | """
 24 | class YOLO(object):
 25 |     _defaults = {
 26 |         "model_path": 'model_data/yolo.h5',
 27 |         "anchors_path": 'model_data/yolo_anchors.txt',
 28 |         "classes_path": 'model_data/coco_classes.txt',
 29 |         "score" : 0.001,
 30 |         "iou" : 0.5,
 31 |         "model_image_size" : (416, 416),
 32 |         "gpu_num" : 4,
 33 |     }
 34 | 
 35 |     @classmethod
 36 |     def get_defaults(cls, n):
 37 |         if n in cls._defaults:
 38 |             return cls._defaults[n]
 39 |         else:
 40 |             return "Unrecognized attribute name '" + n + "'"
 41 | 
 42 |     def __init__(self, **kwargs):
 43 |         self.__dict__.update(self._defaults) # set up default values
 44 |         self.__dict__.update(kwargs) # and update with user overrides
 45 |         self.class_names = self._get_class()
 46 |         self.anchors = self._get_anchors()
 47 |         self.sess = K.get_session()
 48 |         self.boxes, self.scores, self.classes = self.generate()
 49 | 
 50 |     def _get_class(self):
 51 |         classes_path = os.path.expanduser(self.classes_path)
 52 |         with open(classes_path) as f:
 53 |             class_names = f.readlines()
 54 |         class_names = [c.strip() for c in class_names]
 55 |         return class_names
 56 | 
 57 |     def _get_anchors(self):
 58 |         anchors_path = os.path.expanduser(self.anchors_path)
 59 |         with open(anchors_path) as f:
 60 |             anchors = f.readline()
 61 |         anchors = [float(x) for x in anchors.split(',')]
 62 |         return np.array(anchors).reshape(-1, 2)
 63 | 
 64 |     def generate(self):
 65 |         model_path = os.path.expanduser(self.model_path)
 66 |         assert model_path.endswith('.h5'), 'Keras model or weights must be a .h5 file.'
 67 | 
 68 |         # Load model, or construct model and load weights.
 69 |         num_anchors = len(self.anchors)
 70 |         num_classes = len(self.class_names)
 71 |         is_tiny_version = num_anchors==6 # default setting
 72 |         try:
 73 |             self.yolo_model = load_model(model_path, compile=False)
 74 |         except:
 75 |             self.yolo_model = tiny_yolo_body(Input(shape=(None,None,3)), num_anchors//2, num_classes) \
 76 |                 if is_tiny_version else yolo_body(Input(shape=(None,None,3)), num_anchors//3, num_classes)
 77 |             self.yolo_model.load_weights(self.model_path) # make sure model, anchors and classes match
 78 |         else:
 79 |             assert self.yolo_model.layers[-1].output_shape[-1] == \
 80 |                 num_anchors/len(self.yolo_model.output) * (num_classes + 5), \
 81 |                 'Mismatch between model and given anchor and class sizes'
 82 | 
 83 |         print('{} model, anchors, and classes loaded.'.format(model_path))
 84 | 
 85 |         # Generate colors for drawing bounding boxes.
 86 |         hsv_tuples = [(x / len(self.class_names), 1., 1.)
 87 |                       for x in range(len(self.class_names))]
 88 |         self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
 89 |         self.colors = list(
 90 |             map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)),
 91 |                 self.colors))
 92 |         np.random.seed(10101)  # Fixed seed for consistent colors across runs.
 93 |         np.random.shuffle(self.colors)  # Shuffle colors to decorrelate adjacent classes.
 94 |         np.random.seed(None)  # Reset seed to default.
 95 | 
 96 |         # Generate output tensor targets for filtered bounding boxes.
 97 |         self.input_image_shape = K.placeholder(shape=(2, ))
 98 |         if self.gpu_num>=2:
 99 |             self.yolo_model = multi_gpu_model(self.yolo_model, gpus=self.gpu_num)
100 |         boxes, scores, classes = yolo_eval(self.yolo_model.output, self.anchors,
101 |                 len(self.class_names), self.input_image_shape,
102 |                 score_threshold=self.score, iou_threshold=self.iou)
103 |         return boxes, scores, classes
104 | 
105 |     """
106 |     单张图片的预测：
107 |     输入：单张图片
108 |     返回：绘制了检测框 类别 概率的图片
109 |     """
110 |     def detect_image(self, image):
111 |         start = timer()
112 | 
113 |         if self.model_image_size != (None, None):
114 |             assert self.model_image_size[0]%32 == 0, 'Multiples of 32 required'
115 |             assert self.model_image_size[1]%32 == 0, 'Multiples of 32 required'
116 |             boxed_image = letterbox_image(image, tuple(reversed(self.model_image_size)))
117 |         else:
118 |             new_image_size = (image.width - (image.width % 32),
119 |                               image.height - (image.height % 32))
120 |             boxed_image = letterbox_image(image, new_image_size)
121 |         image_data = np.array(boxed_image, dtype='float32')
122 | 
123 |         print(image_data.shape)
124 |         image_data /= 255.
125 |         image_data = np.expand_dims(image_data, 0)  # Add batch dimension.
126 | 
127 |         out_boxes, out_scores, out_classes = self.sess.run(
128 |             [self.boxes, self.scores, self.classes],
129 |             feed_dict={
130 |                 self.yolo_model.input: image_data,
131 |                 self.input_image_shape: [image.size[1], image.size[0]],
132 |                 K.learning_phase(): 0
133 |             })
134 | 
135 |         print('Found {} boxes for {}'.format(len(out_boxes), 'img'))
136 | 
137 |         font = ImageFont.truetype(font='font/FiraMono-Medium.otf',
138 |                     size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32'))
139 |         thickness = (image.size[0] + image.size[1]) // 300
140 | 
141 |         for i, c in reversed(list(enumerate(out_classes))):
142 |             predicted_class = self.class_names[c]
143 |             box = out_boxes[i]
144 |             score = out_scores[i]
145 | 
146 |             label = '{} {:.2f}'.format(predicted_class, score)
147 |             draw = ImageDraw.Draw(image)
148 |             label_size = draw.textsize(label, font)
149 | 
150 |             top, left, bottom, right = box
151 |             top = max(0, np.floor(top + 0.5).astype('int32'))
152 |             left = max(0, np.floor(left + 0.5).astype('int32'))
153 |             bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32'))
154 |             right = min(image.size[0], np.floor(right + 0.5).astype('int32'))
155 |             print(label, (left, top), (right, bottom))
156 | 
157 |             if top - label_size[1] >= 0:
158 |                 text_origin = np.array([left, top - label_size[1]])
159 |             else:
160 |                 text_origin = np.array([left, top + 1])
161 | 
162 |             # My kingdom for a good redistributable image drawing library.
163 |             for i in range(thickness):
164 |                 draw.rectangle(
165 |                     [left + i, top + i, right - i, bottom - i],
166 |                     outline=self.colors[c])
167 |             draw.rectangle(
168 |                 [tuple(text_origin), tuple(text_origin + label_size)],
169 |                 fill=self.colors[c])
170 |             draw.text(text_origin, label, fill=(0, 0, 0), font=font)
171 |             del draw
172 | 
173 |         end = timer()
174 |         print(end - start)
175 |         return image
176 | 
177 | 
178 |     """
179 |     单张图片的预测：
180 |     输入：单张图片
181 |     返回：检测框左上角和右下角的点坐标 分数 类别
182 |     """
183 |     def valid_image(self, image):
184 |         start = timer()
185 | 
186 |         if self.model_image_size != (None, None):
187 |             assert self.model_image_size[0]%32 == 0, 'Multiples of 32 required'
188 |             assert self.model_image_size[1]%32 == 0, 'Multiples of 32 required'
189 |             boxed_image = letterbox_image(image, tuple(reversed(self.model_image_size)))
190 |         else:
191 |             new_image_size = (image.width - (image.width % 32),
192 |                               image.height - (image.height % 32))
193 |             boxed_image = letterbox_image(image, new_image_size)
194 |         image_data = np.array(boxed_image, dtype='float32')
195 | 
196 |         # print(image_data.shape)
197 |         image_data /= 255.
198 |         image_data = np.expand_dims(image_data, 0)  # Add batch dimension.
199 | 
200 |         out_boxes, out_scores, out_classes = self.sess.run(
201 |             [self.boxes, self.scores, self.classes],
202 |             feed_dict={
203 |                 self.yolo_model.input: image_data,
204 |                 self.input_image_shape: [image.size[1], image.size[0]],
205 |                 K.learning_phase(): 0
206 |             })
207 | 
208 |         end = timer()
209 |         print("time:",end - start)
210 |         return out_boxes, out_scores, out_classes
211 | 
212 |     def close_session(self):
213 |         self.sess.close()
214 | 
215 |     """
216 |     视频预测：
217 |     输入：视频路径
218 |     返回：绘制了预测结果的视频
219 |     """
220 | def detect_video(yolo, video_path, output_path=""):
221 |     import cv2
222 |     vid = cv2.VideoCapture(video_path)
223 |     if not vid.isOpened():
224 |         raise IOError("Couldn't open webcam or video")
225 |     video_FourCC    = int(vid.get(cv2.CAP_PROP_FOURCC))
226 |     video_fps       = vid.get(cv2.CAP_PROP_FPS)
227 |     video_size      = (int(vid.get(cv2.CAP_PROP_FRAME_WIDTH)),
228 |                         int(vid.get(cv2.CAP_PROP_FRAME_HEIGHT)))
229 |     isOutput = True if output_path != "" else False
230 |     if isOutput:
231 |         print("!!! TYPE:", type(output_path), type(video_FourCC), type(video_fps), type(video_size))
232 |         out = cv2.VideoWriter(output_path, video_FourCC, video_fps, video_size)
233 |     accum_time = 0
234 |     curr_fps = 0
235 |     fps = "FPS: ??"
236 |     prev_time = timer()
237 |     while True:
238 |         return_value, frame = vid.read()
239 |         image = Image.fromarray(frame)
240 |         image = yolo.detect_image(image)
241 |         result = np.asarray(image)
242 |         curr_time = timer()
243 |         exec_time = curr_time - prev_time
244 |         prev_time = curr_time
245 |         accum_time = accum_time + exec_time
246 |         curr_fps = curr_fps + 1
247 |         if accum_time > 1:
248 |             accum_time = accum_time - 1
249 |             fps = "FPS: " + str(curr_fps)
250 |             curr_fps = 0
251 |         cv2.putText(result, text=fps, org=(3, 15), fontFace=cv2.FONT_HERSHEY_SIMPLEX,
252 |                     fontScale=0.50, color=(255, 0, 0), thickness=2)
253 |         cv2.namedWindow("result", cv2.WINDOW_NORMAL)
254 |         cv2.imshow("result", result)
255 |         if isOutput:
256 |             out.write(result)
257 |         if cv2.waitKey(1) & 0xFF == ord('q'):
258 |             break
259 |     yolo.close_session()
260 | 
261 | 


--------------------------------------------------------------------------------
/yolo3/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HulkMaker/tensorflow-keras-yolov3/04a873529e9941a576e7058e47f8991d188ba15b/yolo3/__init__.py


--------------------------------------------------------------------------------
/yolo3/model.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Created on April, 2019
  3 | @authors: Hulking
  4 | """
  5 | """YOLO_v3 Model Defined in Keras."""
  6 | 
  7 | from functools import wraps
  8 | 
  9 | import numpy as np
 10 | import tensorflow as tf
 11 | from keras import backend as K
 12 | from keras.layers import Conv2D, Add, ZeroPadding2D, UpSampling2D, Concatenate, MaxPooling2D
 13 | from keras.layers.advanced_activations import LeakyReLU
 14 | from keras.layers.normalization import BatchNormalization
 15 | from keras.models import Model
 16 | from keras.regularizers import l2
 17 | 
 18 | from yolo3.utils import compose
 19 | 
 20 | """
 21 | 模型的组件
 22 | """
 23 | @wraps(Conv2D)
 24 | def DarknetConv2D(*args, **kwargs):
 25 |     """Wrapper to set Darknet parameters for Convolution2D."""
 26 |     darknet_conv_kwargs = {'kernel_regularizer': l2(5e-4)}
 27 |     darknet_conv_kwargs['padding'] = 'valid' if kwargs.get('strides')==(2,2) else 'same'
 28 |     darknet_conv_kwargs.update(kwargs)
 29 |     return Conv2D(*args, **darknet_conv_kwargs)
 30 | 
 31 | def DarknetConv2D_BN_Leaky(*args, **kwargs):
 32 |     """Darknet Convolution2D followed by BatchNormalization and LeakyReLU."""
 33 |     no_bias_kwargs = {'use_bias': False}
 34 |     no_bias_kwargs.update(kwargs)
 35 |     return compose(
 36 |         DarknetConv2D(*args, **no_bias_kwargs),
 37 |         BatchNormalization(),
 38 |         LeakyReLU(alpha=0.1))
 39 | """
 40 | 残差
 41 | """
 42 | def resblock_body(x, num_filters, num_blocks):
 43 |     '''A series of resblocks starting with a downsampling Convolution2D'''
 44 |     # Darknet uses left and top padding instead of 'same' mode
 45 |     x = ZeroPadding2D(((1,0),(1,0)))(x)
 46 |     x = DarknetConv2D_BN_Leaky(num_filters, (3,3), strides=(2,2))(x)
 47 |     for i in range(num_blocks):
 48 |         y = compose(
 49 |                 DarknetConv2D_BN_Leaky(num_filters//2, (1,1)),
 50 |                 DarknetConv2D_BN_Leaky(num_filters, (3,3)))(x)
 51 |         x = Add()([x,y])
 52 |     return x
 53 | """
 54 | 骨干网络darknet
 55 | """
 56 | def darknet_body(x):
 57 |     '''Darknent body having 52 Convolution2D layers'''
 58 |     x = DarknetConv2D_BN_Leaky(32, (3,3))(x)
 59 |     x = resblock_body(x, 64, 1)
 60 |     x = resblock_body(x, 128, 2)
 61 |     x = resblock_body(x, 256, 8)
 62 |     x = resblock_body(x, 512, 8)
 63 |     x = resblock_body(x, 1024, 4)
 64 |     return x
 65 | """
 66 | YOLO主体的组件
 67 | """
 68 | def make_last_layers(x, num_filters, out_filters):
 69 |     '''6 Conv2D_BN_Leaky layers followed by a Conv2D_linear layer'''
 70 |     x = compose(
 71 |             DarknetConv2D_BN_Leaky(num_filters, (1,1)),
 72 |             DarknetConv2D_BN_Leaky(num_filters*2, (3,3)),
 73 |             DarknetConv2D_BN_Leaky(num_filters, (1,1)),
 74 |             DarknetConv2D_BN_Leaky(num_filters*2, (3,3)),
 75 |             DarknetConv2D_BN_Leaky(num_filters, (1,1)))(x)
 76 |     y = compose(
 77 |             DarknetConv2D_BN_Leaky(num_filters*2, (3,3)),
 78 |             DarknetConv2D(out_filters, (1,1)))(x)
 79 |     return x, y
 80 | 
 81 | """
 82 | YOLO主体
 83 | """
 84 | def yolo_body(inputs, num_anchors, num_classes):
 85 |     """Create YOLO_V3 model CNN body in Keras."""
 86 |     darknet = Model(inputs, darknet_body(inputs))
 87 |     x, y1 = make_last_layers(darknet.output, 512, num_anchors*(num_classes+5))
 88 | 
 89 |     x = compose(
 90 |             DarknetConv2D_BN_Leaky(256, (1,1)),
 91 |             UpSampling2D(2))(x)
 92 |     x = Concatenate()([x,darknet.layers[152].output])
 93 |     x, y2 = make_last_layers(x, 256, num_anchors*(num_classes+5))
 94 | 
 95 |     x = compose(
 96 |             DarknetConv2D_BN_Leaky(128, (1,1)),
 97 |             UpSampling2D(2))(x)
 98 |     x = Concatenate()([x,darknet.layers[92].output])
 99 |     x, y3 = make_last_layers(x, 128, num_anchors*(num_classes+5))
100 | 
101 |     return Model(inputs, [y1,y2,y3])
102 | """
103 | 阉割版的YOLO
104 | """
105 | def tiny_yolo_body(inputs, num_anchors, num_classes):
106 |     '''Create Tiny YOLO_v3 model CNN body in keras.'''
107 |     x1 = compose(
108 |             DarknetConv2D_BN_Leaky(16, (3,3)),
109 |             MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'),
110 |             DarknetConv2D_BN_Leaky(32, (3,3)),
111 |             MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'),
112 |             DarknetConv2D_BN_Leaky(64, (3,3)),
113 |             MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'),
114 |             DarknetConv2D_BN_Leaky(128, (3,3)),
115 |             MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'),
116 |             DarknetConv2D_BN_Leaky(256, (3,3)))(inputs)
117 |     x2 = compose(
118 |             MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'),
119 |             DarknetConv2D_BN_Leaky(512, (3,3)),
120 |             MaxPooling2D(pool_size=(2,2), strides=(1,1), padding='same'),
121 |             DarknetConv2D_BN_Leaky(1024, (3,3)),
122 |             DarknetConv2D_BN_Leaky(256, (1,1)))(x1)
123 |     y1 = compose(
124 |             DarknetConv2D_BN_Leaky(512, (3,3)),
125 |             DarknetConv2D(num_anchors*(num_classes+5), (1,1)))(x2)
126 | 
127 |     x2 = compose(
128 |             DarknetConv2D_BN_Leaky(128, (1,1)),
129 |             UpSampling2D(2))(x2)
130 |     y2 = compose(
131 |             Concatenate(),
132 |             DarknetConv2D_BN_Leaky(256, (3,3)),
133 |             DarknetConv2D(num_anchors*(num_classes+5), (1,1)))([x2,x1])
134 | 
135 |     return Model(inputs, [y1,y2])
136 | 
137 | """
138 | 将yolo曾输出格式进行转换，便于进行eval
139 | """
140 | def yolo_head(feats, anchors, num_classes, input_shape, calc_loss=False):
141 |     """Convert final layer features to bounding box parameters."""
142 |     num_anchors = len(anchors)
143 |     # Reshape to batch, height, width, num_anchors, box_params.
144 |     anchors_tensor = K.reshape(K.constant(anchors), [1, 1, 1, num_anchors, 2])
145 | 
146 |     grid_shape = K.shape(feats)[1:3] # height, width
147 |     grid_y = K.tile(K.reshape(K.arange(0, stop=grid_shape[0]), [-1, 1, 1, 1]),
148 |         [1, grid_shape[1], 1, 1])
149 |     grid_x = K.tile(K.reshape(K.arange(0, stop=grid_shape[1]), [1, -1, 1, 1]),
150 |         [grid_shape[0], 1, 1, 1])
151 |     grid = K.concatenate([grid_x, grid_y])
152 |     grid = K.cast(grid, K.dtype(feats))
153 | 
154 |     feats = K.reshape(
155 |         feats, [-1, grid_shape[0], grid_shape[1], num_anchors, num_classes + 5])
156 | 
157 |     # Adjust preditions to each spatial grid point and anchor size.
158 |     box_xy = (K.sigmoid(feats[..., :2]) + grid) / K.cast(grid_shape[::-1], K.dtype(feats))
159 |     box_wh = K.exp(feats[..., 2:4]) * anchors_tensor / K.cast(input_shape[::-1], K.dtype(feats))
160 |     box_confidence = K.sigmoid(feats[..., 4:5])
161 |     box_class_probs = K.sigmoid(feats[..., 5:])
162 | 
163 |     if calc_loss == True:
164 |         return grid, feats, box_xy, box_wh
165 |     return box_xy, box_wh, box_confidence, box_class_probs
166 | 
167 | """
168 | 检测框形式由 左上角点+宽高 转换为 左上角点+右下角点
169 | """
170 | def yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape):
171 |     '''Get corrected boxes'''
172 |     box_yx = box_xy[..., ::-1]
173 |     box_hw = box_wh[..., ::-1]
174 |     input_shape = K.cast(input_shape, K.dtype(box_yx))
175 |     image_shape = K.cast(image_shape, K.dtype(box_yx))
176 |     new_shape = K.round(image_shape * K.min(input_shape/image_shape))
177 |     offset = (input_shape-new_shape)/2./input_shape
178 |     scale = input_shape/new_shape
179 |     box_yx = (box_yx - offset) * scale
180 |     box_hw *= scale
181 | 
182 |     box_mins = box_yx - (box_hw / 2.)
183 |     box_maxes = box_yx + (box_hw / 2.)
184 |     boxes =  K.concatenate([
185 |         box_mins[..., 0:1],  # y_min
186 |         box_mins[..., 1:2],  # x_min
187 |         box_maxes[..., 0:1],  # y_max
188 |         box_maxes[..., 1:2]  # x_max
189 |     ])
190 | 
191 |     # Scale boxes back to original image shape.
192 |     boxes *= K.concatenate([image_shape, image_shape])
193 |     return boxes
194 | 
195 | """
196 | 检测框和类别分数计算
197 | """
198 | def yolo_boxes_and_scores(feats, anchors, num_classes, input_shape, image_shape):
199 |     '''Process Conv layer output'''
200 |     box_xy, box_wh, box_confidence, box_class_probs = yolo_head(feats,
201 |         anchors, num_classes, input_shape)
202 |     boxes = yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape)
203 |     boxes = K.reshape(boxes, [-1, 4])
204 |     box_scores = box_confidence * box_class_probs
205 |     box_scores = K.reshape(box_scores, [-1, num_classes])
206 |     return boxes, box_scores
207 | 
208 | """
209 | 将yolo层三个尺度的输出结果转换成检测框+分数+类别的形式
210 | """
211 | def yolo_eval(yolo_outputs,
212 |               anchors,
213 |               num_classes,
214 |               image_shape,
215 |               max_boxes=20,
216 |               score_threshold=.6,
217 |               iou_threshold=.5):
218 |     """Evaluate YOLO model on given input and return filtered boxes."""
219 |     num_layers = len(yolo_outputs)
220 |     anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]] # default setting
221 |     input_shape = K.shape(yolo_outputs[0])[1:3] * 32
222 |     boxes = []
223 |     box_scores = []
224 |     for l in range(num_layers):
225 |         _boxes, _box_scores = yolo_boxes_and_scores(yolo_outputs[l],
226 |             anchors[anchor_mask[l]], num_classes, input_shape, image_shape)
227 |         boxes.append(_boxes)
228 |         box_scores.append(_box_scores)
229 |     boxes = K.concatenate(boxes, axis=0)
230 |     box_scores = K.concatenate(box_scores, axis=0)
231 | 
232 |     mask = box_scores >= score_threshold
233 |     max_boxes_tensor = K.constant(max_boxes, dtype='int32')
234 |     boxes_ = []
235 |     scores_ = []
236 |     classes_ = []
237 |     for c in range(num_classes):
238 |         # TODO: use keras backend instead of tf.
239 |         class_boxes = tf.boolean_mask(boxes, mask[:, c])
240 |         class_box_scores = tf.boolean_mask(box_scores[:, c], mask[:, c])
241 |         nms_index = tf.image.non_max_suppression(
242 |             class_boxes, class_box_scores, max_boxes_tensor, iou_threshold=iou_threshold)
243 |         class_boxes = K.gather(class_boxes, nms_index)
244 |         class_box_scores = K.gather(class_box_scores, nms_index)
245 |         classes = K.ones_like(class_box_scores, 'int32') * c
246 |         boxes_.append(class_boxes)
247 |         scores_.append(class_box_scores)
248 |         classes_.append(classes)
249 |     boxes_ = K.concatenate(boxes_, axis=0)
250 |     scores_ = K.concatenate(scores_, axis=0)
251 |     classes_ = K.concatenate(classes_, axis=0)
252 | 
253 |     return boxes_, scores_, classes_
254 | 
255 | """
256 | 预处理真值的检测框
257 | """
258 | def preprocess_true_boxes(true_boxes, input_shape, anchors, num_classes):
259 |     '''Preprocess true boxes to training input format
260 | 
261 |     Parameters
262 |     ----------
263 |     true_boxes: array, shape=(m, T, 5)
264 |         Absolute x_min, y_min, x_max, y_max, class_id relative to input_shape.
265 |     input_shape: array-like, hw, multiples of 32
266 |     anchors: array, shape=(N, 2), wh
267 |     num_classes: integer
268 | 
269 |     Returns
270 |     -------
271 |     y_true: list of array, shape like yolo_outputs, xywh are reletive value
272 | 
273 |     '''
274 |     assert (true_boxes[..., 4]<num_classes).all(), 'class id must be less than num_classes'
275 |     num_layers = len(anchors)//3 # default setting
276 |     anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]]
277 | 
278 |     true_boxes = np.array(true_boxes, dtype='float32')
279 |     input_shape = np.array(input_shape, dtype='int32')
280 |     boxes_xy = (true_boxes[..., 0:2] + true_boxes[..., 2:4]) // 2
281 |     boxes_wh = true_boxes[..., 2:4] - true_boxes[..., 0:2]
282 |     true_boxes[..., 0:2] = boxes_xy/input_shape[::-1]
283 |     true_boxes[..., 2:4] = boxes_wh/input_shape[::-1]
284 | 
285 |     m = true_boxes.shape[0]
286 |     grid_shapes = [input_shape//{0:32, 1:16, 2:8}[l] for l in range(num_layers)]
287 |     y_true = [np.zeros((m,grid_shapes[l][0],grid_shapes[l][1],len(anchor_mask[l]),5+num_classes),
288 |         dtype='float32') for l in range(num_layers)]
289 | 
290 |     # Expand dim to apply broadcasting.
291 |     anchors = np.expand_dims(anchors, 0)
292 |     anchor_maxes = anchors / 2.
293 |     anchor_mins = -anchor_maxes
294 |     valid_mask = boxes_wh[..., 0]>0
295 | 
296 |     for b in range(m):
297 |         # Discard zero rows.
298 |         wh = boxes_wh[b, valid_mask[b]]
299 |         if len(wh)==0: continue
300 |         # Expand dim to apply broadcasting.
301 |         wh = np.expand_dims(wh, -2)
302 |         box_maxes = wh / 2.
303 |         box_mins = -box_maxes
304 | 
305 |         intersect_mins = np.maximum(box_mins, anchor_mins)
306 |         intersect_maxes = np.minimum(box_maxes, anchor_maxes)
307 |         intersect_wh = np.maximum(intersect_maxes - intersect_mins, 0.)
308 |         intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
309 |         box_area = wh[..., 0] * wh[..., 1]
310 |         anchor_area = anchors[..., 0] * anchors[..., 1]
311 |         iou = intersect_area / (box_area + anchor_area - intersect_area)
312 | 
313 |         # Find best anchor for each true box
314 |         best_anchor = np.argmax(iou, axis=-1)
315 | 
316 |         for t, n in enumerate(best_anchor):
317 |             for l in range(num_layers):
318 |                 if n in anchor_mask[l]:
319 |                     i = np.floor(true_boxes[b,t,0]*grid_shapes[l][1]).astype('int32')
320 |                     j = np.floor(true_boxes[b,t,1]*grid_shapes[l][0]).astype('int32')
321 |                     k = anchor_mask[l].index(n)
322 |                     c = true_boxes[b,t, 4].astype('int32')
323 |                     y_true[l][b, j, i, k, 0:4] = true_boxes[b,t, 0:4]
324 |                     y_true[l][b, j, i, k, 4] = 1
325 |                     y_true[l][b, j, i, k, 5+c] = 1
326 | 
327 |     return y_true
328 | 
329 | """
330 | 计算检测框之间的交叠比
331 | """
332 | def box_iou(b1, b2):
333 |     '''Return iou tensor
334 | 
335 |     Parameters
336 |     ----------
337 |     b1: tensor, shape=(i1,...,iN, 4), xywh
338 |     b2: tensor, shape=(j, 4), xywh
339 | 
340 |     Returns
341 |     -------
342 |     iou: tensor, shape=(i1,...,iN, j)
343 | 
344 |     '''
345 | 
346 |     # Expand dim to apply broadcasting.
347 |     b1 = K.expand_dims(b1, -2)
348 |     b1_xy = b1[..., :2]
349 |     b1_wh = b1[..., 2:4]
350 |     b1_wh_half = b1_wh/2.
351 |     b1_mins = b1_xy - b1_wh_half
352 |     b1_maxes = b1_xy + b1_wh_half
353 | 
354 |     # Expand dim to apply broadcasting.
355 |     b2 = K.expand_dims(b2, 0)
356 |     b2_xy = b2[..., :2]
357 |     b2_wh = b2[..., 2:4]
358 |     b2_wh_half = b2_wh/2.
359 |     b2_mins = b2_xy - b2_wh_half
360 |     b2_maxes = b2_xy + b2_wh_half
361 | 
362 |     intersect_mins = K.maximum(b1_mins, b2_mins)
363 |     intersect_maxes = K.minimum(b1_maxes, b2_maxes)
364 |     intersect_wh = K.maximum(intersect_maxes - intersect_mins, 0.)
365 |     intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
366 |     b1_area = b1_wh[..., 0] * b1_wh[..., 1]
367 |     b2_area = b2_wh[..., 0] * b2_wh[..., 1]
368 |     iou = intersect_area / (b1_area + b2_area - intersect_area)
369 | 
370 |     return iou
371 | 
372 | """
373 | 损失函数
374 | """
375 | def yolo_loss(args, anchors, num_classes, ignore_thresh=.5, print_loss=False):
376 |     '''Return yolo_loss tensor
377 | 
378 |     Parameters
379 |     ----------
380 |     yolo_outputs: list of tensor, the output of yolo_body or tiny_yolo_body
381 |     y_true: list of array, the output of preprocess_true_boxes
382 |     anchors: array, shape=(N, 2), wh
383 |     num_classes: integer
384 |     ignore_thresh: float, the iou threshold whether to ignore object confidence loss
385 | 
386 |     Returns
387 |     -------
388 |     loss: tensor, shape=(1,)
389 | 
390 |     '''
391 |     num_layers = len(anchors)//3 # default setting
392 |     yolo_outputs = args[:num_layers]
393 |     y_true = args[num_layers:]
394 |     anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]]
395 |     input_shape = K.cast(K.shape(yolo_outputs[0])[1:3] * 32, K.dtype(y_true[0]))
396 |     grid_shapes = [K.cast(K.shape(yolo_outputs[l])[1:3], K.dtype(y_true[0])) for l in range(num_layers)]
397 |     loss = 0
398 |     m = K.shape(yolo_outputs[0])[0] # batch size, tensor
399 |     mf = K.cast(m, K.dtype(yolo_outputs[0]))
400 | 
401 |     for l in range(num_layers):
402 |         object_mask = y_true[l][..., 4:5]
403 |         true_class_probs = y_true[l][..., 5:]
404 | 
405 |         grid, raw_pred, pred_xy, pred_wh = yolo_head(yolo_outputs[l],
406 |              anchors[anchor_mask[l]], num_classes, input_shape, calc_loss=True)
407 |         pred_box = K.concatenate([pred_xy, pred_wh])
408 | 
409 |         # Darknet raw box to calculate loss.
410 |         raw_true_xy = y_true[l][..., :2]*grid_shapes[l][::-1] - grid
411 |         raw_true_wh = K.log(y_true[l][..., 2:4] / anchors[anchor_mask[l]] * input_shape[::-1])
412 |         raw_true_wh = K.switch(object_mask, raw_true_wh, K.zeros_like(raw_true_wh)) # avoid log(0)=-inf
413 |         box_loss_scale = 2 - y_true[l][...,2:3]*y_true[l][...,3:4]
414 | 
415 |         # Find ignore mask, iterate over each of batch.
416 |         ignore_mask = tf.TensorArray(K.dtype(y_true[0]), size=1, dynamic_size=True)
417 |         object_mask_bool = K.cast(object_mask, 'bool')
418 |         def loop_body(b, ignore_mask):
419 |             true_box = tf.boolean_mask(y_true[l][b,...,0:4], object_mask_bool[b,...,0])
420 |             iou = box_iou(pred_box[b], true_box)
421 |             best_iou = K.max(iou, axis=-1)
422 |             ignore_mask = ignore_mask.write(b, K.cast(best_iou<ignore_thresh, K.dtype(true_box)))
423 |             return b+1, ignore_mask
424 |         _, ignore_mask = K.control_flow_ops.while_loop(lambda b,*args: b<m, loop_body, [0, ignore_mask])
425 |         ignore_mask = ignore_mask.stack()
426 |         ignore_mask = K.expand_dims(ignore_mask, -1)
427 | 
428 |         # K.binary_crossentropy is helpful to avoid exp overflow.
429 |         xy_loss = object_mask * box_loss_scale * K.binary_crossentropy(raw_true_xy, raw_pred[...,0:2], from_logits=True)
430 |         wh_loss = object_mask * box_loss_scale * 0.5 * K.square(raw_true_wh-raw_pred[...,2:4])
431 |         confidence_loss = object_mask * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True)+ \
432 |             (1-object_mask) * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True) * ignore_mask
433 |         class_loss = object_mask * K.binary_crossentropy(true_class_probs, raw_pred[...,5:], from_logits=True)
434 | 
435 |         xy_loss = K.sum(xy_loss) / mf
436 |         wh_loss = K.sum(wh_loss) / mf
437 |         confidence_loss = K.sum(confidence_loss) / mf
438 |         class_loss = K.sum(class_loss) / mf
439 |         loss += xy_loss + wh_loss + confidence_loss + class_loss
440 |         if print_loss:
441 |             loss = tf.Print(loss, [loss, xy_loss, wh_loss, confidence_loss, class_loss, K.sum(ignore_mask)], message='loss: ')
442 |     return loss
443 | 


--------------------------------------------------------------------------------
/yolo3/utils.py:
--------------------------------------------------------------------------------
  1 | """Miscellaneous utility functions."""
  2 | 
  3 | from functools import reduce
  4 | 
  5 | from PIL import Image
  6 | import numpy as np
  7 | from matplotlib.colors import rgb_to_hsv, hsv_to_rgb
  8 | """
  9 | 结构运算
 10 | """
 11 | def compose(*funcs):
 12 |     """Compose arbitrarily many functions, evaluated left to right.
 13 | 
 14 |     Reference: https://mathieularose.com/function-composition-in-python/
 15 |     """
 16 |     # return lambda x: reduce(lambda v, f: f(v), funcs, x)
 17 |     if funcs:
 18 |         return reduce(lambda f, g: lambda *a, **kw: g(f(*a, **kw)), funcs)
 19 |     else:
 20 |         raise ValueError('Composition of empty sequence not supported.')
 21 | """
 22 | 将载入图片以宽高比不变的条件下封装进固定大小的正方形图片里
 23 | """
 24 | def letterbox_image(image, size):
 25 |     '''resize image with unchanged aspect ratio using padding'''
 26 |     iw, ih = image.size
 27 |     w, h = size
 28 |     scale = min(w/iw, h/ih)
 29 |     nw = int(iw*scale)
 30 |     nh = int(ih*scale)
 31 | 
 32 |     image = image.resize((nw,nh), Image.BICUBIC)
 33 |     new_image = Image.new('RGB', size, (128,128,128))
 34 |     new_image.paste(image, ((w-nw)//2, (h-nh)//2))
 35 |     return new_image
 36 | """
 37 | 随机数生成
 38 | """
 39 | def rand(a=0, b=1):
 40 |     return np.random.rand()*(b-a) + a
 41 | 
 42 | def get_random_data(annotation_line, input_shape, random=True, max_boxes=20, jitter=.3, hue=.1, sat=1.5, val=1.5, proc_img=True):
 43 |     '''random preprocessing for real-time data augmentation'''
 44 |     line = annotation_line.split()
 45 |     image = Image.open(line[0])
 46 |     iw, ih = image.size
 47 |     h, w = input_shape
 48 |     box = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]])
 49 | 
 50 |     if not random:
 51 |         # resize image
 52 |         scale = min(w/iw, h/ih)
 53 |         nw = int(iw*scale)
 54 |         nh = int(ih*scale)
 55 |         dx = (w-nw)//2
 56 |         dy = (h-nh)//2
 57 |         image_data=0
 58 |         if proc_img:
 59 |             image = image.resize((nw,nh), Image.BICUBIC)
 60 |             new_image = Image.new('RGB', (w,h), (128,128,128))
 61 |             new_image.paste(image, (dx, dy))
 62 |             image_data = np.array(new_image)/255.
 63 | 
 64 |         # correct boxes
 65 |         box_data = np.zeros((max_boxes,5))
 66 |         if len(box)>0:
 67 |             np.random.shuffle(box)
 68 |             if len(box)>max_boxes: box = box[:max_boxes]
 69 |             box[:, [0,2]] = box[:, [0,2]]*scale + dx
 70 |             box[:, [1,3]] = box[:, [1,3]]*scale + dy
 71 |             box_data[:len(box)] = box
 72 | 
 73 |         return image_data, box_data
 74 | 
 75 |     # resize image
 76 |     new_ar = w/h * rand(1-jitter,1+jitter)/rand(1-jitter,1+jitter)
 77 |     scale = rand(.25, 2)
 78 |     if new_ar < 1:
 79 |         nh = int(scale*h)
 80 |         nw = int(nh*new_ar)
 81 |     else:
 82 |         nw = int(scale*w)
 83 |         nh = int(nw/new_ar)
 84 |     image = image.resize((nw,nh), Image.BICUBIC)
 85 | 
 86 |     # place image
 87 |     dx = int(rand(0, w-nw))
 88 |     dy = int(rand(0, h-nh))
 89 |     new_image = Image.new('RGB', (w,h), (128,128,128))
 90 |     new_image.paste(image, (dx, dy))
 91 |     image = new_image
 92 | 
 93 |     # flip image or not
 94 |     flip = rand()<.5
 95 |     if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT)
 96 | 
 97 |     # distort image
 98 |     hue = rand(-hue, hue)
 99 |     sat = rand(1, sat) if rand()<.5 else 1/rand(1, sat)
100 |     val = rand(1, val) if rand()<.5 else 1/rand(1, val)
101 |     x = rgb_to_hsv(np.array(image)/255.)
102 |     x[..., 0] += hue
103 |     x[..., 0][x[..., 0]>1] -= 1
104 |     x[..., 0][x[..., 0]<0] += 1
105 |     x[..., 1] *= sat
106 |     x[..., 2] *= val
107 |     x[x>1] = 1
108 |     x[x<0] = 0
109 |     image_data = hsv_to_rgb(x) # numpy array, 0 to 1
110 | 
111 |     # correct boxes
112 |     box_data = np.zeros((max_boxes,5))
113 |     if len(box)>0:
114 |         np.random.shuffle(box)
115 |         box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx
116 |         box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy
117 |         if flip: box[:, [0,2]] = w - box[:, [2,0]]
118 |         box[:, 0:2][box[:, 0:2]<0] = 0
119 |         box[:, 2][box[:, 2]>w] = w
120 |         box[:, 3][box[:, 3]>h] = h
121 |         box_w = box[:, 2] - box[:, 0]
122 |         box_h = box[:, 3] - box[:, 1]
123 |         box = box[np.logical_and(box_w>1, box_h>1)] # discard invalid box
124 |         if len(box)>max_boxes: box = box[:max_boxes]
125 |         box_data[:len(box)] = box
126 | 
127 |     return image_data, box_data
128 | 


--------------------------------------------------------------------------------
/yolo_valid.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Created on April, 2019
 3 | @authors: Hulking
 4 | """
 5 | 
 6 | import os
 7 | import cv2
 8 | import numpy as np
 9 | import tensorflow as tf
10 | import sys
11 | import argparse
12 | from yolo import YOLO, detect_video
13 | from PIL import Image
14 | from pycocoEval import cal_coco_map
15 | 
16 | import glob
17 | from keras.preprocessing.image import load_img, img_to_array
18 | from keras.applications.imagenet_utils import preprocess_input as preprocess
19 | from keras import backend as K
20 | 
21 | np.set_printoptions(suppress=True)  # to supress scientific notation in print
22 | 
23 | from functools import wraps
24 | 
25 | '''
26 | 计算全部图片检测结果
27 | '''
28 | def valid_detector(yolo):
29 |     classes_path="model_data/coco_classes.txt"
30 |     with open(classes_path) as f:
31 |         obj_list = f.readlines()
32 |         ## remove whitespace characters like `\n` at the end of each line
33 |         obj_list = [x.strip() for x in obj_list]
34 | 
35 |     coco_ids= [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 31, 32,
36 |                33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,
37 |                59, 60, 61, 62, 63, 64, 65, 67, 70, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 84, 85, 86, 87, 88,
38 |                89, 90]
39 | 
40 |     imglist_path="model_data/5k.txt"
41 |     dt_result_path = "results/cocoapi_results.json"
42 | 
43 |     if os.path.exists(dt_result_path):
44 |         os.remove(dt_result_path)
45 |     with open(dt_result_path, "a") as new_p:
46 |         new_p.write("[")
47 |         with open(imglist_path) as f:
48 |             total_img_list = f.readlines()
49 |             total_img_list = [x.strip() for x in total_img_list]
50 |             total_img_num = len(total_img_list)
51 |             i=0
52 |             for image_path in total_img_list:
53 | 
54 |                 if (os.path.exists(image_path)):
55 |                     print(i,image_path)
56 | 
57 |                 orig_index=int(image_path[50:56])
58 |                 img = Image.open(image_path)
59 | 
60 |                 boxes, scores, classes =yolo.valid_image(img)
61 | 
62 |                 for j in range(len(classes)):
63 |                     coco_id=coco_ids[int(classes[j])]
64 |                     top, left, bottom, right=boxes[j]
65 | 
66 |                     width=round(right-left,4)
67 |                     height=round(bottom-top,4)
68 | 
69 |                     # print("\ni, j ",i,j)
70 |                     # print("\nleft, top,  width, height ",left, top, width, height)
71 | 
72 |                     if i==(total_img_num-1) and j== (len(classes)-1):
73 |                         new_p.write(
74 |                             "{\"image_id\":" + str(orig_index) + ", \"category_id\":" + str(coco_id) + ", \"bbox\":[" + \
75 |                             str(left) + ", " + str(top) + ", " + str(width) + ", " + str(height) + "], \"score\":" + str(scores[j]) + "}")
76 |                     else:
77 |                     #print("corrected left, top, width, height", left, top, width, height)
78 |                         new_p.write(
79 |                             "{\"image_id\":"+str(orig_index)+", \"category_id\":" +str(coco_id)+ ", \"bbox\":[" + \
80 |                                 str(left)+ ", " + str(top) + ", " + str(width) + ", " + str(height) + "], \"score\":"+str(scores[j]) +"},\n")
81 |                 i += 1
82 |             new_p.write("]")
83 |         print("\n\n\n")
84 | 
85 | if __name__=='__main__':
86 |     valid_detector(YOLO())
87 |     cal_coco_map()
88 | 
89 | 
90 | 
91 | 


--------------------------------------------------------------------------------
/yolo_video.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | import argparse
 3 | from yolo import YOLO, detect_video
 4 | from PIL import Image
 5 | """
 6 | 单张图片计算
 7 | """
 8 | def detect_img(yolo):
 9 |     while True:
10 |         img = input('Input image filename:')
11 |         print(img)
12 |         try:
13 |             image = Image.open(img)
14 |         except:
15 |             print('Open Error! Try again!')
16 |             continue
17 |         else:
18 |             r_image = yolo.valid_image(image)
19 |             r_image.show()
20 |     yolo.close_session()
21 | 
22 | FLAGS = None
23 | 
24 | if __name__ == '__main__':
25 |     # class YOLO defines the default value, so suppress any default here
26 |     parser = argparse.ArgumentParser(argument_default=argparse.SUPPRESS)
27 |     '''
28 |     Command line options
29 |     '''
30 |     parser.add_argument(
31 |         '--model', type=str,
32 |         help='path to model weight file, default ' + YOLO.get_defaults("model_path")
33 |     )
34 | 
35 |     parser.add_argument(
36 |         '--anchors', type=str,
37 |         help='path to anchor definitions, default ' + YOLO.get_defaults("anchors_path")
38 |     )
39 | 
40 |     parser.add_argument(
41 |         '--classes', type=str,
42 |         help='path to class definitions, default ' + YOLO.get_defaults("classes_path")
43 |     )
44 | 
45 |     parser.add_argument(
46 |         '--gpu_num', type=int,
47 |         help='Number of GPU to use, default ' + str(YOLO.get_defaults("gpu_num"))
48 |     )
49 | 
50 |     parser.add_argument(
51 |         '--image', default=False, action="store_true",
52 |         help='Image detection mode, will ignore all positional arguments'
53 |     )
54 |     '''
55 |     Command line positional arguments -- for video detection mode
56 |     '''
57 |     parser.add_argument(
58 |         "--input", nargs='?', type=str,required=False,default='./path2your_video',
59 |         help = "Video input path"
60 |     )
61 | 
62 |     parser.add_argument(
63 |         "--output", nargs='?', type=str, default="",
64 |         help = "[Optional] Video output path"
65 |     )
66 | 
67 |     FLAGS = parser.parse_args()
68 | 
69 |     if FLAGS.image:
70 |         """
71 |         Image detection mode, disregard any remaining command line arguments
72 |         """
73 |         print("Image detection mode")
74 |         if "input" in FLAGS:
75 |             print(" Ignoring remaining command line arguments: " + FLAGS.input + "," + FLAGS.output)
76 |         detect_img(YOLO(**vars(FLAGS)))
77 |     elif "input" in FLAGS:
78 |         detect_video(YOLO(**vars(FLAGS)), FLAGS.input, FLAGS.output)
79 |     else:
80 |         print("Must specify at least video_input_path.  See usage with --help.")
81 | 


--------------------------------------------------------------------------------
/yolov3-tiny.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | batch=1
  4 | subdivisions=1
  5 | # Training
  6 | # batch=64
  7 | # subdivisions=2
  8 | width=416
  9 | height=416
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=400000,450000
 23 | scales=.1,.1
 24 | 
 25 | [convolutional]
 26 | batch_normalize=1
 27 | filters=16
 28 | size=3
 29 | stride=1
 30 | pad=1
 31 | activation=leaky
 32 | 
 33 | [maxpool]
 34 | size=2
 35 | stride=2
 36 | 
 37 | [convolutional]
 38 | batch_normalize=1
 39 | filters=32
 40 | size=3
 41 | stride=1
 42 | pad=1
 43 | activation=leaky
 44 | 
 45 | [maxpool]
 46 | size=2
 47 | stride=2
 48 | 
 49 | [convolutional]
 50 | batch_normalize=1
 51 | filters=64
 52 | size=3
 53 | stride=1
 54 | pad=1
 55 | activation=leaky
 56 | 
 57 | [maxpool]
 58 | size=2
 59 | stride=2
 60 | 
 61 | [convolutional]
 62 | batch_normalize=1
 63 | filters=128
 64 | size=3
 65 | stride=1
 66 | pad=1
 67 | activation=leaky
 68 | 
 69 | [maxpool]
 70 | size=2
 71 | stride=2
 72 | 
 73 | [convolutional]
 74 | batch_normalize=1
 75 | filters=256
 76 | size=3
 77 | stride=1
 78 | pad=1
 79 | activation=leaky
 80 | 
 81 | [maxpool]
 82 | size=2
 83 | stride=2
 84 | 
 85 | [convolutional]
 86 | batch_normalize=1
 87 | filters=512
 88 | size=3
 89 | stride=1
 90 | pad=1
 91 | activation=leaky
 92 | 
 93 | [maxpool]
 94 | size=2
 95 | stride=1
 96 | 
 97 | [convolutional]
 98 | batch_normalize=1
 99 | filters=1024
100 | size=3
101 | stride=1
102 | pad=1
103 | activation=leaky
104 | 
105 | ###########
106 | 
107 | [convolutional]
108 | batch_normalize=1
109 | filters=256
110 | size=1
111 | stride=1
112 | pad=1
113 | activation=leaky
114 | 
115 | [convolutional]
116 | batch_normalize=1
117 | filters=512
118 | size=3
119 | stride=1
120 | pad=1
121 | activation=leaky
122 | 
123 | [convolutional]
124 | size=1
125 | stride=1
126 | pad=1
127 | filters=255
128 | activation=linear
129 | 
130 | 
131 | 
132 | [yolo]
133 | mask = 3,4,5
134 | anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
135 | classes=80
136 | num=6
137 | jitter=.3
138 | ignore_thresh = .7
139 | truth_thresh = 1
140 | random=1
141 | 
142 | [route]
143 | layers = -4
144 | 
145 | [convolutional]
146 | batch_normalize=1
147 | filters=128
148 | size=1
149 | stride=1
150 | pad=1
151 | activation=leaky
152 | 
153 | [upsample]
154 | stride=2
155 | 
156 | [route]
157 | layers = -1, 8
158 | 
159 | [convolutional]
160 | batch_normalize=1
161 | filters=256
162 | size=3
163 | stride=1
164 | pad=1
165 | activation=leaky
166 | 
167 | [convolutional]
168 | size=1
169 | stride=1
170 | pad=1
171 | filters=255
172 | activation=linear
173 | 
174 | [yolo]
175 | mask = 1,2,3
176 | anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
177 | classes=80
178 | num=6
179 | jitter=.3
180 | ignore_thresh = .7
181 | truth_thresh = 1
182 | random=1
183 | 


--------------------------------------------------------------------------------
/yolov3.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | batch=1
  4 | subdivisions=32
  5 | # Training
  6 | # batch=64
  7 | # subdivisions=16
  8 | width=416
  9 | height=416
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=400000,450000
 23 | scales=.1,.1
 24 | 
 25 | [convolutional]
 26 | batch_normalize=1
 27 | filters=32
 28 | size=3
 29 | stride=1
 30 | pad=1
 31 | activation=leaky
 32 | 
 33 | # Downsample
 34 | 
 35 | [convolutional]
 36 | batch_normalize=1
 37 | filters=64
 38 | size=3
 39 | stride=2
 40 | pad=1
 41 | activation=leaky
 42 | 
 43 | [convolutional]
 44 | batch_normalize=1
 45 | filters=32
 46 | size=1
 47 | stride=1
 48 | pad=1
 49 | activation=leaky
 50 | 
 51 | [convolutional]
 52 | batch_normalize=1
 53 | filters=64
 54 | size=3
 55 | stride=1
 56 | pad=1
 57 | activation=leaky
 58 | 
 59 | [shortcut]
 60 | from=-3
 61 | activation=linear
 62 | 
 63 | # Downsample
 64 | 
 65 | [convolutional]
 66 | batch_normalize=1
 67 | filters=128
 68 | size=3
 69 | stride=2
 70 | pad=1
 71 | activation=leaky
 72 | 
 73 | [convolutional]
 74 | batch_normalize=1
 75 | filters=64
 76 | size=1
 77 | stride=1
 78 | pad=1
 79 | activation=leaky
 80 | 
 81 | [convolutional]
 82 | batch_normalize=1
 83 | filters=128
 84 | size=3
 85 | stride=1
 86 | pad=1
 87 | activation=leaky
 88 | 
 89 | [shortcut]
 90 | from=-3
 91 | activation=linear
 92 | 
 93 | [convolutional]
 94 | batch_normalize=1
 95 | filters=64
 96 | size=1
 97 | stride=1
 98 | pad=1
 99 | activation=leaky
100 | 
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 | 
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 | 
113 | # Downsample
114 | 
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 | 
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 | 
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 | 
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 | 
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 | 
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 | 
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 | 
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 | 
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 | 
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 | 
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 | 
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 | 
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 | 
203 | 
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 | 
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 | 
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 | 
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 | 
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 | 
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 | 
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 | 
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 | 
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 | 
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 | 
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 | 
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 | 
284 | # Downsample
285 | 
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 | 
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 | 
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 | 
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 | 
314 | 
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 | 
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 | 
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 | 
335 | 
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 | 
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 | 
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 | 
356 | 
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 | 
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 | 
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 | 
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 | 
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 | 
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 | 
397 | 
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 | 
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 | 
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 | 
418 | 
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 | 
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 | 
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 | 
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 | 
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 | 
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 | 
459 | # Downsample
460 | 
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 | 
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 | 
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 | 
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 | 
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 | 
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 | 
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 | 
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 | 
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 | 
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 | 
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 | 
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 | 
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 | 
549 | ######################
550 | 
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 | 
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 | 
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 | 
575 | [convolutional]
576 | batch_normalize=1
577 | size=3
578 | stride=1
579 | pad=1
580 | filters=1024
581 | activation=leaky
582 | 
583 | [convolutional]
584 | batch_normalize=1
585 | filters=512
586 | size=1
587 | stride=1
588 | pad=1
589 | activation=leaky
590 | 
591 | [convolutional]
592 | batch_normalize=1
593 | size=3
594 | stride=1
595 | pad=1
596 | filters=1024
597 | activation=leaky
598 | 
599 | [convolutional]
600 | size=1
601 | stride=1
602 | pad=1
603 | filters=255
604 | activation=linear
605 | 
606 | 
607 | [yolo]
608 | mask = 6,7,8
609 | anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
610 | classes=80
611 | num=9
612 | jitter=.3
613 | ignore_thresh = .5
614 | truth_thresh = 1
615 | random=1
616 | 
617 | 
618 | [route]
619 | layers = -4
620 | 
621 | [convolutional]
622 | batch_normalize=1
623 | filters=256
624 | size=1
625 | stride=1
626 | pad=1
627 | activation=leaky
628 | 
629 | [upsample]
630 | stride=2
631 | 
632 | [route]
633 | layers = -1, 61
634 | 
635 | 
636 | 
637 | [convolutional]
638 | batch_normalize=1
639 | filters=256
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=leaky
644 | 
645 | [convolutional]
646 | batch_normalize=1
647 | size=3
648 | stride=1
649 | pad=1
650 | filters=512
651 | activation=leaky
652 | 
653 | [convolutional]
654 | batch_normalize=1
655 | filters=256
656 | size=1
657 | stride=1
658 | pad=1
659 | activation=leaky
660 | 
661 | [convolutional]
662 | batch_normalize=1
663 | size=3
664 | stride=1
665 | pad=1
666 | filters=512
667 | activation=leaky
668 | 
669 | [convolutional]
670 | batch_normalize=1
671 | filters=256
672 | size=1
673 | stride=1
674 | pad=1
675 | activation=leaky
676 | 
677 | [convolutional]
678 | batch_normalize=1
679 | size=3
680 | stride=1
681 | pad=1
682 | filters=512
683 | activation=leaky
684 | 
685 | [convolutional]
686 | size=1
687 | stride=1
688 | pad=1
689 | filters=255
690 | activation=linear
691 | 
692 | 
693 | [yolo]
694 | mask = 3,4,5
695 | anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
696 | classes=80
697 | num=9
698 | jitter=.3
699 | ignore_thresh = .5
700 | truth_thresh = 1
701 | random=1
702 | 
703 | 
704 | 
705 | [route]
706 | layers = -4
707 | 
708 | [convolutional]
709 | batch_normalize=1
710 | filters=128
711 | size=1
712 | stride=1
713 | pad=1
714 | activation=leaky
715 | 
716 | [upsample]
717 | stride=2
718 | 
719 | [route]
720 | layers = -1, 36
721 | 
722 | 
723 | 
724 | [convolutional]
725 | batch_normalize=1
726 | filters=128
727 | size=1
728 | stride=1
729 | pad=1
730 | activation=leaky
731 | 
732 | [convolutional]
733 | batch_normalize=1
734 | size=3
735 | stride=1
736 | pad=1
737 | filters=256
738 | activation=leaky
739 | 
740 | [convolutional]
741 | batch_normalize=1
742 | filters=128
743 | size=1
744 | stride=1
745 | pad=1
746 | activation=leaky
747 | 
748 | [convolutional]
749 | batch_normalize=1
750 | size=3
751 | stride=1
752 | pad=1
753 | filters=256
754 | activation=leaky
755 | 
756 | [convolutional]
757 | batch_normalize=1
758 | filters=128
759 | size=1
760 | stride=1
761 | pad=1
762 | activation=leaky
763 | 
764 | [convolutional]
765 | batch_normalize=1
766 | size=3
767 | stride=1
768 | pad=1
769 | filters=256
770 | activation=leaky
771 | 
772 | [convolutional]
773 | size=1
774 | stride=1
775 | pad=1
776 | filters=255
777 | activation=linear
778 | 
779 | 
780 | [yolo]
781 | mask = 0,1,2
782 | anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
783 | classes=80
784 | num=9
785 | jitter=.3
786 | ignore_thresh = .5
787 | truth_thresh = 1
788 | random=1
789 | 
790 | 


--------------------------------------------------------------------------------