├── LICENSE ├── Original.jpg ├── README.md ├── coco_annotation.py ├── convert.py ├── darknet53.cfg ├── draw_bbox.py ├── font ├── FiraMono-Medium.otf └── SIL Open Font License.txt ├── kmeans.py ├── model_data ├── 5k.txt ├── coco_classes.txt ├── tiny_yolo_anchors.txt ├── voc_classes.txt └── yolo_anchors.txt ├── pycocoEval.py ├── train.py ├── train_bottleneck.py ├── voc_annotation.py ├── yolo.py ├── yolo3 ├── __init__.py ├── model.py └── utils.py ├── yolo_valid.py ├── yolo_video.py ├── yolov3-tiny.cfg └── yolov3.cfg /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 qqwweee 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /Original.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HulkMaker/tensorflow-keras-yolov3/04a873529e9941a576e7058e47f8991d188ba15b/Original.jpg -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # tensorflow-keras-yolov3 2 | (cocoapi mAP计算在下方↓↓↓) 3 | [![license](https://img.shields.io/github/license/mashape/apistatus.svg)](LICENSE) 4 | 5 | --- 6 | 7 | ### Quick Start 8 | 9 | 1. The test environment is 10 | - cudatoolkit 9.2 11 | - cudnn 7.2.1 12 | - Python 3.6.8 13 | - Keras 2.2.0 14 | - tensorflow 1.10.0 15 | - pillow = 5.4.1 16 | - matplotlib 3.0.2 17 | 18 | 2. Download YOLOv3 weights from [YOLO website](http://pjreddie.com/darknet/yolo/). 19 | 3. Convert the Darknet YOLO model to a Keras model .h5 file. 20 | 4. Modified default converted model path in yolo.py line26 (default in '/home/common/pretrained_models/yolo.h5') 21 | 22 | ### Run single image detection demo 23 | ``` 24 | wget https://pjreddie.com/media/files/yolov3.weights 25 | python convert.py yolov3.cfg yolov3.weights /home/common/pretrained_models/yolo.h5 26 | python yolo_video.py [OPTIONS...] --image, for image detection mode, OR 27 | python yolo_video.py [video_path] [output_path (optional)] 28 | For Tiny YOLOv3, just do in a similar way, just specify model path and anchor path with `--model model_file` and `--anchors anchor_file`. 29 | ``` 30 | --- 31 | ### Calcualte mAP on cocoapi 32 | ``` 33 | 1. cd tensorflow-keras-yolov3 34 | 2. pip install cython # solution of issue:(gcc: error: pycocotools/_mask.c: No such file or directory) 35 | 3. sudo rm -rf cocoapi && git clone https://github.com/cocodataset/cocoapi && cd cocoapi/PythonAPI && make && cd ../.. && cp -r cocoapi/PythonAPI/pycocotools ./ 36 | 4. Use `python yolo_valid.py` to test the official YOLOv3 weights. 37 | ``` 38 | 39 | --- 40 | ### Other usage 41 | Use --help to see usage of yolo_video.py: 42 | ``` 43 | usage: yolo_video.py [-h] [--model MODEL] [--anchors ANCHORS] 44 | [--classes CLASSES] [--gpu_num GPU_NUM] [--image] 45 | [--input] [--output] 46 | 47 | positional arguments: 48 | --input Video input path 49 | --output Video output path 50 | 51 | optional arguments: 52 | -h, --help show this help message and exit 53 | --model MODEL path to model weight file, default model_data/yolo.h5 54 | --anchors ANCHORS path to anchor definitions, default 55 | model_data/yolo_anchors.txt 56 | --classes CLASSES path to class definitions, default 57 | model_data/coco_classes.txt 58 | --gpu_num GPU_NUM Number of GPU to use, default 1 59 | --image Image detection mode, will ignore all positional arguments 60 | ``` 61 | 4. MultiGPU usage: use `--gpu_num N` to use N GPUs. It is passed to the [Keras multi_gpu_model()](https://keras.io/utils/#multi_gpu_model). 62 | --- 63 | ### Training 64 | 65 | 1. Generate your own annotation file and class names file. 66 | One row for one image; 67 | Row format: `image_file_path box1 box2 ... boxN`; 68 | Box format: `x_min,y_min,x_max,y_max,class_id` (no space). 69 | For VOC dataset, try `python voc_annotation.py` 70 | Here is an example: 71 | ``` 72 | path/to/img1.jpg 50,100,150,200,0 30,50,200,120,3 73 | path/to/img2.jpg 120,300,250,600,2 74 | ... 75 | ``` 76 | 77 | 2. Make sure you have run `python convert.py -w yolov3.cfg yolov3.weights model_data/yolo_weights.h5` 78 | The file model_data/yolo_weights.h5 is used to load pretrained weights. 79 | 80 | 3. Modify train.py and start training. 81 | `python train.py` 82 | Use your trained weights or checkpoint weights with command line option `--model model_file` when using yolo_video.py 83 | Remember to modify class path or anchor path, with `--classes class_file` and `--anchors anchor_file`. 84 | 85 | If you want to use original pretrained weights for YOLOv3: 86 | 1. `wget https://pjreddie.com/media/files/darknet53.conv.74` 87 | 2. rename it as darknet53.weights 88 | 3. `python convert.py -w darknet53.cfg darknet53.weights model_data/darknet53_weights.h5` 89 | 4. use model_data/darknet53_weights.h5 in train.py 90 | 91 | --- 92 | ### Some issues to know 93 | 94 | 95 | 96 | 1. Default anchors are used. If you use your own anchors, probably some changes are needed. 97 | 98 | 2. The inference result is not totally the same as Darknet but the difference is small. 99 | 100 | 3. The speed is slower than Darknet. Replacing PIL with opencv may help a little. 101 | 102 | 4. Always load pretrained weights and freeze layers in the first stage of training. Or try Darknet training. It's OK if there is a mismatch warning. 103 | 104 | 105 | # tensorflow-keras-yolov3 106 | -------------------------------------------------------------------------------- /coco_annotation.py: -------------------------------------------------------------------------------- 1 | import json 2 | from collections import defaultdict 3 | 4 | name_box_id = defaultdict(list) 5 | id_name = dict() 6 | f = open( 7 | "mscoco2017/annotations/instances_train2017.json", 8 | encoding='utf-8') 9 | data = json.load(f) 10 | 11 | annotations = data['annotations'] 12 | for ant in annotations: 13 | id = ant['image_id'] 14 | name = 'mscoco2017/train2017/%012d.jpg' % id 15 | cat = ant['category_id'] 16 | 17 | if cat >= 1 and cat <= 11: 18 | cat = cat - 1 19 | elif cat >= 13 and cat <= 25: 20 | cat = cat - 2 21 | elif cat >= 27 and cat <= 28: 22 | cat = cat - 3 23 | elif cat >= 31 and cat <= 44: 24 | cat = cat - 5 25 | elif cat >= 46 and cat <= 65: 26 | cat = cat - 6 27 | elif cat == 67: 28 | cat = cat - 7 29 | elif cat == 70: 30 | cat = cat - 9 31 | elif cat >= 72 and cat <= 82: 32 | cat = cat - 10 33 | elif cat >= 84 and cat <= 90: 34 | cat = cat - 11 35 | 36 | name_box_id[name].append([ant['bbox'], cat]) 37 | 38 | f = open('train.txt', 'w') 39 | for key in name_box_id.keys(): 40 | f.write(key) 41 | box_infos = name_box_id[key] 42 | for info in box_infos: 43 | x_min = int(info[0][0]) 44 | y_min = int(info[0][1]) 45 | x_max = x_min + int(info[0][2]) 46 | y_max = y_min + int(info[0][3]) 47 | 48 | box_info = " %d,%d,%d,%d,%d" % ( 49 | x_min, y_min, x_max, y_max, int(info[1])) 50 | f.write(box_info) 51 | f.write('\n') 52 | f.close() 53 | -------------------------------------------------------------------------------- /convert.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python 2 | """ 3 | Created on April, 2019 4 | @authors: Hulking 5 | """ 6 | """ 7 | Reads Darknet config and weights and creates Keras model with TF backend. 8 | """ 9 | 10 | import argparse 11 | import configparser 12 | import io 13 | import os 14 | from collections import defaultdict 15 | 16 | import numpy as np 17 | from keras import backend as K 18 | from keras.layers import (Conv2D, Input, ZeroPadding2D, Add, 19 | UpSampling2D, MaxPooling2D, Concatenate) 20 | from keras.layers.advanced_activations import LeakyReLU 21 | from keras.layers.normalization import BatchNormalization 22 | from keras.models import Model 23 | from keras.regularizers import l2 24 | from keras.utils.vis_utils import plot_model as plot 25 | 26 | 27 | parser = argparse.ArgumentParser(description='Darknet To Keras Converter.') 28 | parser.add_argument('config_path', help='Path to Darknet cfg file.') 29 | parser.add_argument('weights_path', help='Path to Darknet weights file.') 30 | parser.add_argument('output_path', help='Path to output Keras model file.') 31 | parser.add_argument( 32 | '-p', 33 | '--plot_model', 34 | help='Plot generated Keras model and save as image.', 35 | action='store_true') 36 | parser.add_argument( 37 | '-w', 38 | '--weights_only', 39 | help='Save as Keras weights file instead of model file.', 40 | action='store_true') 41 | 42 | def unique_config_sections(config_file): 43 | """Convert all config sections to have unique names. 44 | 45 | Adds unique suffixes to config sections for compability with configparser. 46 | """ 47 | section_counters = defaultdict(int) 48 | output_stream = io.StringIO() 49 | with open(config_file) as fin: 50 | for line in fin: 51 | if line.startswith('['): 52 | section = line.strip().strip('[]') 53 | _section = section + '_' + str(section_counters[section]) 54 | section_counters[section] += 1 55 | line = line.replace(section, _section) 56 | output_stream.write(line) 57 | output_stream.seek(0) 58 | return output_stream 59 | 60 | # %% 61 | def _main(args): 62 | config_path = os.path.expanduser(args.config_path) 63 | weights_path = os.path.expanduser(args.weights_path) 64 | assert config_path.endswith('.cfg'), '{} is not a .cfg file'.format( 65 | config_path) 66 | assert weights_path.endswith( 67 | '.weights'), '{} is not a .weights file'.format(weights_path) 68 | 69 | output_path = os.path.expanduser(args.output_path) 70 | assert output_path.endswith( 71 | '.h5'), 'output path {} is not a .h5 file'.format(output_path) 72 | output_root = os.path.splitext(output_path)[0] 73 | 74 | # Load weights and config. 75 | print('Loading weights.') 76 | weights_file = open(weights_path, 'rb') 77 | major, minor, revision = np.ndarray( 78 | shape=(3, ), dtype='int32', buffer=weights_file.read(12)) 79 | if (major*10+minor)>=2 and major<1000 and minor<1000: 80 | seen = np.ndarray(shape=(1,), dtype='int64', buffer=weights_file.read(8)) 81 | else: 82 | seen = np.ndarray(shape=(1,), dtype='int32', buffer=weights_file.read(4)) 83 | print('Weights Header: ', major, minor, revision, seen) 84 | 85 | print('Parsing Darknet config.') 86 | unique_config_file = unique_config_sections(config_path) 87 | cfg_parser = configparser.ConfigParser() 88 | cfg_parser.read_file(unique_config_file) 89 | 90 | print('Creating Keras model.') 91 | input_layer = Input(shape=(None, None, 3)) 92 | prev_layer = input_layer 93 | all_layers = [] 94 | 95 | weight_decay = float(cfg_parser['net_0']['decay'] 96 | ) if 'net_0' in cfg_parser.sections() else 5e-4 97 | count = 0 98 | out_index = [] 99 | for section in cfg_parser.sections(): 100 | print('Parsing section {}'.format(section)) 101 | if section.startswith('convolutional'): 102 | filters = int(cfg_parser[section]['filters']) 103 | size = int(cfg_parser[section]['size']) 104 | stride = int(cfg_parser[section]['stride']) 105 | pad = int(cfg_parser[section]['pad']) 106 | activation = cfg_parser[section]['activation'] 107 | batch_normalize = 'batch_normalize' in cfg_parser[section] 108 | 109 | padding = 'same' if pad == 1 and stride == 1 else 'valid' 110 | 111 | # Setting weights. 112 | # Darknet serializes convolutional weights as: 113 | # [bias/beta, [gamma, mean, variance], conv_weights] 114 | prev_layer_shape = K.int_shape(prev_layer) 115 | 116 | weights_shape = (size, size, prev_layer_shape[-1], filters) 117 | darknet_w_shape = (filters, weights_shape[2], size, size) 118 | weights_size = np.product(weights_shape) 119 | 120 | print('conv2d', 'bn' 121 | if batch_normalize else ' ', activation, weights_shape) 122 | 123 | conv_bias = np.ndarray( 124 | shape=(filters, ), 125 | dtype='float32', 126 | buffer=weights_file.read(filters * 4)) 127 | count += filters 128 | 129 | if batch_normalize: 130 | bn_weights = np.ndarray( 131 | shape=(3, filters), 132 | dtype='float32', 133 | buffer=weights_file.read(filters * 12)) 134 | count += 3 * filters 135 | 136 | bn_weight_list = [ 137 | bn_weights[0], # scale gamma 138 | conv_bias, # shift beta 139 | bn_weights[1], # running mean 140 | bn_weights[2] # running var 141 | ] 142 | 143 | conv_weights = np.ndarray( 144 | shape=darknet_w_shape, 145 | dtype='float32', 146 | buffer=weights_file.read(weights_size * 4)) 147 | count += weights_size 148 | 149 | # DarkNet conv_weights are serialized Caffe-style: 150 | # (out_dim, in_dim, height, width) 151 | # We would like to set these to Tensorflow order: 152 | # (height, width, in_dim, out_dim) 153 | conv_weights = np.transpose(conv_weights, [2, 3, 1, 0]) 154 | conv_weights = [conv_weights] if batch_normalize else [ 155 | conv_weights, conv_bias 156 | ] 157 | 158 | # Handle activation. 159 | act_fn = None 160 | if activation == 'leaky': 161 | pass # Add advanced activation later. 162 | elif activation != 'linear': 163 | raise ValueError( 164 | 'Unknown activation function `{}` in section {}'.format( 165 | activation, section)) 166 | 167 | # Create Conv2D layer 168 | if stride>1: 169 | # Darknet uses left and top padding instead of 'same' mode 170 | prev_layer = ZeroPadding2D(((1,0),(1,0)))(prev_layer) 171 | conv_layer = (Conv2D( 172 | filters, (size, size), 173 | strides=(stride, stride), 174 | kernel_regularizer=l2(weight_decay), 175 | use_bias=not batch_normalize, 176 | weights=conv_weights, 177 | activation=act_fn, 178 | padding=padding))(prev_layer) 179 | 180 | if batch_normalize: 181 | conv_layer = (BatchNormalization( 182 | weights=bn_weight_list))(conv_layer) 183 | prev_layer = conv_layer 184 | 185 | if activation == 'linear': 186 | all_layers.append(prev_layer) 187 | elif activation == 'leaky': 188 | act_layer = LeakyReLU(alpha=0.1)(prev_layer) 189 | prev_layer = act_layer 190 | all_layers.append(act_layer) 191 | 192 | elif section.startswith('route'): 193 | ids = [int(i) for i in cfg_parser[section]['layers'].split(',')] 194 | layers = [all_layers[i] for i in ids] 195 | if len(layers) > 1: 196 | print('Concatenating route layers:', layers) 197 | concatenate_layer = Concatenate()(layers) 198 | all_layers.append(concatenate_layer) 199 | prev_layer = concatenate_layer 200 | else: 201 | skip_layer = layers[0] # only one layer to route 202 | all_layers.append(skip_layer) 203 | prev_layer = skip_layer 204 | 205 | elif section.startswith('maxpool'): 206 | size = int(cfg_parser[section]['size']) 207 | stride = int(cfg_parser[section]['stride']) 208 | all_layers.append( 209 | MaxPooling2D( 210 | pool_size=(size, size), 211 | strides=(stride, stride), 212 | padding='same')(prev_layer)) 213 | prev_layer = all_layers[-1] 214 | 215 | elif section.startswith('shortcut'): 216 | index = int(cfg_parser[section]['from']) 217 | activation = cfg_parser[section]['activation'] 218 | assert activation == 'linear', 'Only linear activation supported.' 219 | all_layers.append(Add()([all_layers[index], prev_layer])) 220 | prev_layer = all_layers[-1] 221 | 222 | elif section.startswith('upsample'): 223 | stride = int(cfg_parser[section]['stride']) 224 | assert stride == 2, 'Only stride=2 supported.' 225 | all_layers.append(UpSampling2D(stride)(prev_layer)) 226 | prev_layer = all_layers[-1] 227 | 228 | elif section.startswith('yolo'): 229 | out_index.append(len(all_layers)-1) 230 | all_layers.append(None) 231 | prev_layer = all_layers[-1] 232 | 233 | elif section.startswith('net'): 234 | pass 235 | 236 | else: 237 | raise ValueError( 238 | 'Unsupported section header type: {}'.format(section)) 239 | 240 | # Create and save model. 241 | if len(out_index)==0: out_index.append(len(all_layers)-1) 242 | model = Model(inputs=input_layer, outputs=[all_layers[i] for i in out_index]) 243 | print(model.summary()) 244 | if args.weights_only: 245 | model.save_weights('{}'.format(output_path)) 246 | print('Saved Keras weights to {}'.format(output_path)) 247 | else: 248 | model.save('{}'.format(output_path)) 249 | print('Saved Keras model to {}'.format(output_path)) 250 | 251 | # Check to see if all weights have been read. 252 | remaining_weights = len(weights_file.read()) / 4 253 | weights_file.close() 254 | print('Read {} of {} from Darknet weights.'.format(count, count + 255 | remaining_weights)) 256 | if remaining_weights > 0: 257 | print('Warning: {} unused weights'.format(remaining_weights)) 258 | 259 | if args.plot_model: 260 | plot(model, to_file='{}.png'.format(output_root), show_shapes=True) 261 | print('Saved model plot to {}.png'.format(output_root)) 262 | 263 | 264 | if __name__ == '__main__': 265 | _main(parser.parse_args()) 266 | -------------------------------------------------------------------------------- /darknet53.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | batch=1 4 | subdivisions=1 5 | # Training 6 | # batch=64 7 | # subdivisions=16 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=32 28 | size=3 29 | stride=1 30 | pad=1 31 | activation=leaky 32 | 33 | # Downsample 34 | 35 | [convolutional] 36 | batch_normalize=1 37 | filters=64 38 | size=3 39 | stride=2 40 | pad=1 41 | activation=leaky 42 | 43 | [convolutional] 44 | batch_normalize=1 45 | filters=32 46 | size=1 47 | stride=1 48 | pad=1 49 | activation=leaky 50 | 51 | [convolutional] 52 | batch_normalize=1 53 | filters=64 54 | size=3 55 | stride=1 56 | pad=1 57 | activation=leaky 58 | 59 | [shortcut] 60 | from=-3 61 | activation=linear 62 | 63 | # Downsample 64 | 65 | [convolutional] 66 | batch_normalize=1 67 | filters=128 68 | size=3 69 | stride=2 70 | pad=1 71 | activation=leaky 72 | 73 | [convolutional] 74 | batch_normalize=1 75 | filters=64 76 | size=1 77 | stride=1 78 | pad=1 79 | activation=leaky 80 | 81 | [convolutional] 82 | batch_normalize=1 83 | filters=128 84 | size=3 85 | stride=1 86 | pad=1 87 | activation=leaky 88 | 89 | [shortcut] 90 | from=-3 91 | activation=linear 92 | 93 | [convolutional] 94 | batch_normalize=1 95 | filters=64 96 | size=1 97 | stride=1 98 | pad=1 99 | activation=leaky 100 | 101 | [convolutional] 102 | batch_normalize=1 103 | filters=128 104 | size=3 105 | stride=1 106 | pad=1 107 | activation=leaky 108 | 109 | [shortcut] 110 | from=-3 111 | activation=linear 112 | 113 | # Downsample 114 | 115 | [convolutional] 116 | batch_normalize=1 117 | filters=256 118 | size=3 119 | stride=2 120 | pad=1 121 | activation=leaky 122 | 123 | [convolutional] 124 | batch_normalize=1 125 | filters=128 126 | size=1 127 | stride=1 128 | pad=1 129 | activation=leaky 130 | 131 | [convolutional] 132 | batch_normalize=1 133 | filters=256 134 | size=3 135 | stride=1 136 | pad=1 137 | activation=leaky 138 | 139 | [shortcut] 140 | from=-3 141 | activation=linear 142 | 143 | [convolutional] 144 | batch_normalize=1 145 | filters=128 146 | size=1 147 | stride=1 148 | pad=1 149 | activation=leaky 150 | 151 | [convolutional] 152 | batch_normalize=1 153 | filters=256 154 | size=3 155 | stride=1 156 | pad=1 157 | activation=leaky 158 | 159 | [shortcut] 160 | from=-3 161 | activation=linear 162 | 163 | [convolutional] 164 | batch_normalize=1 165 | filters=128 166 | size=1 167 | stride=1 168 | pad=1 169 | activation=leaky 170 | 171 | [convolutional] 172 | batch_normalize=1 173 | filters=256 174 | size=3 175 | stride=1 176 | pad=1 177 | activation=leaky 178 | 179 | [shortcut] 180 | from=-3 181 | activation=linear 182 | 183 | [convolutional] 184 | batch_normalize=1 185 | filters=128 186 | size=1 187 | stride=1 188 | pad=1 189 | activation=leaky 190 | 191 | [convolutional] 192 | batch_normalize=1 193 | filters=256 194 | size=3 195 | stride=1 196 | pad=1 197 | activation=leaky 198 | 199 | [shortcut] 200 | from=-3 201 | activation=linear 202 | 203 | 204 | [convolutional] 205 | batch_normalize=1 206 | filters=128 207 | size=1 208 | stride=1 209 | pad=1 210 | activation=leaky 211 | 212 | [convolutional] 213 | batch_normalize=1 214 | filters=256 215 | size=3 216 | stride=1 217 | pad=1 218 | activation=leaky 219 | 220 | [shortcut] 221 | from=-3 222 | activation=linear 223 | 224 | [convolutional] 225 | batch_normalize=1 226 | filters=128 227 | size=1 228 | stride=1 229 | pad=1 230 | activation=leaky 231 | 232 | [convolutional] 233 | batch_normalize=1 234 | filters=256 235 | size=3 236 | stride=1 237 | pad=1 238 | activation=leaky 239 | 240 | [shortcut] 241 | from=-3 242 | activation=linear 243 | 244 | [convolutional] 245 | batch_normalize=1 246 | filters=128 247 | size=1 248 | stride=1 249 | pad=1 250 | activation=leaky 251 | 252 | [convolutional] 253 | batch_normalize=1 254 | filters=256 255 | size=3 256 | stride=1 257 | pad=1 258 | activation=leaky 259 | 260 | [shortcut] 261 | from=-3 262 | activation=linear 263 | 264 | [convolutional] 265 | batch_normalize=1 266 | filters=128 267 | size=1 268 | stride=1 269 | pad=1 270 | activation=leaky 271 | 272 | [convolutional] 273 | batch_normalize=1 274 | filters=256 275 | size=3 276 | stride=1 277 | pad=1 278 | activation=leaky 279 | 280 | [shortcut] 281 | from=-3 282 | activation=linear 283 | 284 | # Downsample 285 | 286 | [convolutional] 287 | batch_normalize=1 288 | filters=512 289 | size=3 290 | stride=2 291 | pad=1 292 | activation=leaky 293 | 294 | [convolutional] 295 | batch_normalize=1 296 | filters=256 297 | size=1 298 | stride=1 299 | pad=1 300 | activation=leaky 301 | 302 | [convolutional] 303 | batch_normalize=1 304 | filters=512 305 | size=3 306 | stride=1 307 | pad=1 308 | activation=leaky 309 | 310 | [shortcut] 311 | from=-3 312 | activation=linear 313 | 314 | 315 | [convolutional] 316 | batch_normalize=1 317 | filters=256 318 | size=1 319 | stride=1 320 | pad=1 321 | activation=leaky 322 | 323 | [convolutional] 324 | batch_normalize=1 325 | filters=512 326 | size=3 327 | stride=1 328 | pad=1 329 | activation=leaky 330 | 331 | [shortcut] 332 | from=-3 333 | activation=linear 334 | 335 | 336 | [convolutional] 337 | batch_normalize=1 338 | filters=256 339 | size=1 340 | stride=1 341 | pad=1 342 | activation=leaky 343 | 344 | [convolutional] 345 | batch_normalize=1 346 | filters=512 347 | size=3 348 | stride=1 349 | pad=1 350 | activation=leaky 351 | 352 | [shortcut] 353 | from=-3 354 | activation=linear 355 | 356 | 357 | [convolutional] 358 | batch_normalize=1 359 | filters=256 360 | size=1 361 | stride=1 362 | pad=1 363 | activation=leaky 364 | 365 | [convolutional] 366 | batch_normalize=1 367 | filters=512 368 | size=3 369 | stride=1 370 | pad=1 371 | activation=leaky 372 | 373 | [shortcut] 374 | from=-3 375 | activation=linear 376 | 377 | [convolutional] 378 | batch_normalize=1 379 | filters=256 380 | size=1 381 | stride=1 382 | pad=1 383 | activation=leaky 384 | 385 | [convolutional] 386 | batch_normalize=1 387 | filters=512 388 | size=3 389 | stride=1 390 | pad=1 391 | activation=leaky 392 | 393 | [shortcut] 394 | from=-3 395 | activation=linear 396 | 397 | 398 | [convolutional] 399 | batch_normalize=1 400 | filters=256 401 | size=1 402 | stride=1 403 | pad=1 404 | activation=leaky 405 | 406 | [convolutional] 407 | batch_normalize=1 408 | filters=512 409 | size=3 410 | stride=1 411 | pad=1 412 | activation=leaky 413 | 414 | [shortcut] 415 | from=-3 416 | activation=linear 417 | 418 | 419 | [convolutional] 420 | batch_normalize=1 421 | filters=256 422 | size=1 423 | stride=1 424 | pad=1 425 | activation=leaky 426 | 427 | [convolutional] 428 | batch_normalize=1 429 | filters=512 430 | size=3 431 | stride=1 432 | pad=1 433 | activation=leaky 434 | 435 | [shortcut] 436 | from=-3 437 | activation=linear 438 | 439 | [convolutional] 440 | batch_normalize=1 441 | filters=256 442 | size=1 443 | stride=1 444 | pad=1 445 | activation=leaky 446 | 447 | [convolutional] 448 | batch_normalize=1 449 | filters=512 450 | size=3 451 | stride=1 452 | pad=1 453 | activation=leaky 454 | 455 | [shortcut] 456 | from=-3 457 | activation=linear 458 | 459 | # Downsample 460 | 461 | [convolutional] 462 | batch_normalize=1 463 | filters=1024 464 | size=3 465 | stride=2 466 | pad=1 467 | activation=leaky 468 | 469 | [convolutional] 470 | batch_normalize=1 471 | filters=512 472 | size=1 473 | stride=1 474 | pad=1 475 | activation=leaky 476 | 477 | [convolutional] 478 | batch_normalize=1 479 | filters=1024 480 | size=3 481 | stride=1 482 | pad=1 483 | activation=leaky 484 | 485 | [shortcut] 486 | from=-3 487 | activation=linear 488 | 489 | [convolutional] 490 | batch_normalize=1 491 | filters=512 492 | size=1 493 | stride=1 494 | pad=1 495 | activation=leaky 496 | 497 | [convolutional] 498 | batch_normalize=1 499 | filters=1024 500 | size=3 501 | stride=1 502 | pad=1 503 | activation=leaky 504 | 505 | [shortcut] 506 | from=-3 507 | activation=linear 508 | 509 | [convolutional] 510 | batch_normalize=1 511 | filters=512 512 | size=1 513 | stride=1 514 | pad=1 515 | activation=leaky 516 | 517 | [convolutional] 518 | batch_normalize=1 519 | filters=1024 520 | size=3 521 | stride=1 522 | pad=1 523 | activation=leaky 524 | 525 | [shortcut] 526 | from=-3 527 | activation=linear 528 | 529 | [convolutional] 530 | batch_normalize=1 531 | filters=512 532 | size=1 533 | stride=1 534 | pad=1 535 | activation=leaky 536 | 537 | [convolutional] 538 | batch_normalize=1 539 | filters=1024 540 | size=3 541 | stride=1 542 | pad=1 543 | activation=leaky 544 | 545 | [shortcut] 546 | from=-3 547 | activation=linear 548 | 549 | -------------------------------------------------------------------------------- /draw_bbox.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | """ 4 | Created on April, 2019 5 | @authors: Hulking 6 | """ 7 | import os 8 | import cv2 9 | import glob 10 | import numpy as np 11 | from keras.preprocessing.image import load_img, img_to_array 12 | from keras.applications.imagenet_utils import preprocess_input as preprocess 13 | from pycocotools.coco import COCO 14 | import skimage.io as io 15 | import matplotlib.pyplot as plt 16 | import matplotlib.patches as patches 17 | import pylab 18 | from PIL import Image, ImageFont, ImageDraw 19 | 20 | """ 21 | 路径定义 22 | """ 23 | #path list 24 | anchors_path='./model_data/yolo_anchors.txt' 25 | classes_path='./model_data/coco_classes.txt' 26 | img_list_path='./model_data/5k.txt' 27 | img_list_dir="/home/common/datasets/coco/" 28 | imgs_path='/home/common/datasets/coco/images/val2014/' 29 | gt_folder= '/home/common/datasets/coco/annotations/' 30 | res_path='./results/cocoapi_results.json' 31 | res_dir='./results/' 32 | res_imgs_path='./results/pics/' 33 | # load and display instance annotations 34 | dataDir='/home/common/datasets/coco' 35 | dataType='val2014' 36 | annFile='{}/annotations/instances_{}.json'.format(dataDir,dataType) 37 | print ("annFile",annFile) 38 | coco=COCO(annFile) 39 | 40 | 41 | """ 42 | 画框函数 43 | """ 44 | def draw_rectangle(draw, coordinates, color, width=1): 45 | for i in range(width): 46 | rect_start = (coordinates[0][0] - i, coordinates[0][1] - i) 47 | rect_end = (coordinates[1][0] + i, coordinates[1][1] + i) 48 | draw.rectangle((rect_start, rect_end), outline = color) 49 | 50 | 51 | """ 52 | 区分不同类别框的颜色 53 | """ 54 | def id_to_color(id): 55 | #id=id & 63 56 | num=id+1 57 | R=(num%2)*10+(num>>1)%2 58 | G=((num>>1)%2)*10+(num>>1)%2 59 | B=((num>>1)%2)*10+(num>>1)%2 60 | R=(id%7)*13+R*4 61 | G=(id%8)*18+G*6 62 | B=(id%5)*17+B*9 63 | return R,G,B 64 | 65 | 66 | with open(classes_path) as f: 67 | obj_list = f.readlines() 68 | obj_list = [x.strip() for x in obj_list] 69 | 70 | coco_ids= [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 31, 32, 71 | 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 72 | 59, 60, 61, 62, 63, 64, 65, 67, 70, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 84, 85, 86, 87, 88, 73 | 89, 90] 74 | 75 | 76 | """ 77 | 图片结果绘制 78 | """ 79 | with open(res_dir+"cocoapi_results.json") as rf: 80 | rf_list=rf.readlines() 81 | rf_list=[x.strip() for x in rf_list] 82 | 83 | with open(img_list_dir+"5k.txt") as f: 84 | total_img_list = f.readlines() 85 | # remove whitespace characters like `\n` at the end of each line 86 | total_img_list = [x.strip() for x in total_img_list] 87 | total_num_t_img = len(total_img_list) 88 | print("number of images in 5k list: ", total_num_t_img) 89 | gt_num = 0 90 | 91 | for image_path in total_img_list: 92 | gt_num += 1 93 | print(image_path) 94 | img=Image.open(image_path) 95 | draw =ImageDraw.Draw(img) 96 | image_name=int(image_path[50:56]) 97 | print("image_name:",image_name) 98 | 99 | #draw GT bbox,class name,score 100 | imgIds = coco.getImgIds(imgIds = [image_name]) 101 | annIds = coco.getAnnIds(imgIds, iscrowd=None) 102 | anns = coco.loadAnns(annIds) 103 | # coco.showAnns(anns) 104 | # plt.show() 105 | for n in range(len(anns)): 106 | print (n) 107 | x, y, w, h = anns[n]['bbox'] 108 | x, y, w, h = int(x), int(y), int(w), int(h) 109 | cat=anns[n]['category_id'] 110 | print("gt-obj:",obj_list[coco_ids.index(cat)]) 111 | print(cat,x, y, w, h) 112 | draw_rectangle(draw, ((x, y), (x + w, y + h)), color=(0,255,0), width=outline_width) 113 | draw.text((x, y-offset_y), obj_list[coco_ids.index(cat)], font=setFont,fill=(0,255,0), width= 0.3) 114 | 115 | 116 | rf_num=0 117 | for rf_dict in rf_list: 118 | rf_dict=ast.literal_eval(rf_dict) 119 | 120 | rf_id=rf_dict['image_id'] 121 | if image_name==rf_id: 122 | rf_num+=1 123 | # print("image_name==rf_id",rf_id) 124 | # print ("rf_index=",rf_num) 125 | x, y, w, h = rf_dict['bbox'] 126 | x, y, w, h = int(x), int(y), int(w), int(h) 127 | print(x, y, w, h) 128 | obj_name=obj_list[coco_ids.index(rf_dict['category_id'])] 129 | print_content=obj_name+" conf:"+str(round(rf_dict['score'],3)) 130 | #outline_width = int(x*y/2000) 131 | outline_width=4 132 | outline_color = id_to_color(rf_dict['category_id']) 133 | #draw_rectangle(draw, ((x, y), (x + w, y + h)), color=outline_color, width=outline_width) 134 | setFont= ImageFont.truetype("./font/FiraMono-Medium.otf", 25, encoding="unic") 135 | offset_y=30 136 | #draw.text((x, y-offset_y), obj_name, font=setFont,fill=id_to_color(rf_dict['category_id']), width= 0.5) 137 | draw_rectangle(draw, ((x, y), (x + w, y + h)), color=(255,0,0), width=outline_width) 138 | draw.text((x, y-offset_y), print_content, font=setFont,fill=(255,0,0), width= 0.3) 139 | img.save(res_imgs_path+str(image_name).zfill(6)+'.jpg') 140 | plt.imshow(img) 141 | plt.show() 142 | print("number of images with gt in 5k list: ", gt_num) 143 | 144 | 145 | -------------------------------------------------------------------------------- /font/FiraMono-Medium.otf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HulkMaker/tensorflow-keras-yolov3/04a873529e9941a576e7058e47f8991d188ba15b/font/FiraMono-Medium.otf -------------------------------------------------------------------------------- /font/SIL Open Font License.txt: -------------------------------------------------------------------------------- 1 | Copyright (c) 2014, Mozilla Foundation https://mozilla.org/ with Reserved Font Name Fira Mono. 2 | 3 | Copyright (c) 2014, Telefonica S.A. 4 | 5 | This Font Software is licensed under the SIL Open Font License, Version 1.1. 6 | This license is copied below, and is also available with a FAQ at: http://scripts.sil.org/OFL 7 | 8 | ----------------------------------------------------------- 9 | SIL OPEN FONT LICENSE Version 1.1 - 26 February 2007 10 | ----------------------------------------------------------- 11 | 12 | PREAMBLE 13 | The goals of the Open Font License (OFL) are to stimulate worldwide development of collaborative font projects, to support the font creation efforts of academic and linguistic communities, and to provide a free and open framework in which fonts may be shared and improved in partnership with others. 14 | 15 | The OFL allows the licensed fonts to be used, studied, modified and redistributed freely as long as they are not sold by themselves. The fonts, including any derivative works, can be bundled, embedded, redistributed and/or sold with any software provided that any reserved names are not used by derivative works. The fonts and derivatives, however, cannot be released under any other type of license. The requirement for fonts to remain under this license does not apply to any document created using the fonts or their derivatives. 16 | 17 | DEFINITIONS 18 | "Font Software" refers to the set of files released by the Copyright Holder(s) under this license and clearly marked as such. This may include source files, build scripts and documentation. 19 | 20 | "Reserved Font Name" refers to any names specified as such after the copyright statement(s). 21 | 22 | "Original Version" refers to the collection of Font Software components as distributed by the Copyright Holder(s). 23 | 24 | "Modified Version" refers to any derivative made by adding to, deleting, or substituting -- in part or in whole -- any of the components of the Original Version, by changing formats or by porting the Font Software to a new environment. 25 | 26 | "Author" refers to any designer, engineer, programmer, technical writer or other person who contributed to the Font Software. 27 | 28 | PERMISSION & CONDITIONS 29 | Permission is hereby granted, free of charge, to any person obtaining a copy of the Font Software, to use, study, copy, merge, embed, modify, redistribute, and sell modified and unmodified copies of the Font Software, subject to the following conditions: 30 | 31 | 1) Neither the Font Software nor any of its individual components, in Original or Modified Versions, may be sold by itself. 32 | 33 | 2) Original or Modified Versions of the Font Software may be bundled, redistributed and/or sold with any software, provided that each copy contains the above copyright notice and this license. These can be included either as stand-alone text files, human-readable headers or in the appropriate machine-readable metadata fields within text or binary files as long as those fields can be easily viewed by the user. 34 | 35 | 3) No Modified Version of the Font Software may use the Reserved Font Name(s) unless explicit written permission is granted by the corresponding Copyright Holder. This restriction only applies to the primary font name as presented to the users. 36 | 37 | 4) The name(s) of the Copyright Holder(s) or the Author(s) of the Font Software shall not be used to promote, endorse or advertise any Modified Version, except to acknowledge the contribution(s) of the Copyright Holder(s) and the Author(s) or with their explicit written permission. 38 | 39 | 5) The Font Software, modified or unmodified, in part or in whole, must be distributed entirely under this license, and must not be distributed under any other license. The requirement for fonts to remain under this license does not apply to any document created using the Font Software. 40 | 41 | TERMINATION 42 | This license becomes null and void if any of the above conditions are not met. 43 | 44 | DISCLAIMER 45 | THE FONT SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF COPYRIGHT, PATENT, TRADEMARK, OR OTHER RIGHT. IN NO EVENT SHALL THE COPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, INCLUDING ANY GENERAL, SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF THE USE OR INABILITY TO USE THE FONT SOFTWARE OR FROM OTHER DEALINGS IN THE FONT SOFTWARE. -------------------------------------------------------------------------------- /kmeans.py: -------------------------------------------------------------------------------- 1 | """ 2 | Created on April, 2019 3 | @authors: Hulking 4 | """ 5 | import numpy as np 6 | """ 7 | 使用K-means算法计算锚点的最优选择 8 | """ 9 | class YOLO_Kmeans: 10 | 11 | def __init__(self, cluster_number, filename): 12 | self.cluster_number = cluster_number 13 | self.filename = "2012_train.txt" 14 | 15 | def iou(self, boxes, clusters): # 1 box -> k clusters 16 | n = boxes.shape[0] 17 | k = self.cluster_number 18 | 19 | box_area = boxes[:, 0] * boxes[:, 1] 20 | box_area = box_area.repeat(k) 21 | box_area = np.reshape(box_area, (n, k)) 22 | 23 | cluster_area = clusters[:, 0] * clusters[:, 1] 24 | cluster_area = np.tile(cluster_area, [1, n]) 25 | cluster_area = np.reshape(cluster_area, (n, k)) 26 | 27 | box_w_matrix = np.reshape(boxes[:, 0].repeat(k), (n, k)) 28 | cluster_w_matrix = np.reshape(np.tile(clusters[:, 0], (1, n)), (n, k)) 29 | min_w_matrix = np.minimum(cluster_w_matrix, box_w_matrix) 30 | 31 | box_h_matrix = np.reshape(boxes[:, 1].repeat(k), (n, k)) 32 | cluster_h_matrix = np.reshape(np.tile(clusters[:, 1], (1, n)), (n, k)) 33 | min_h_matrix = np.minimum(cluster_h_matrix, box_h_matrix) 34 | inter_area = np.multiply(min_w_matrix, min_h_matrix) 35 | 36 | result = inter_area / (box_area + cluster_area - inter_area) 37 | return result 38 | 39 | def avg_iou(self, boxes, clusters): 40 | accuracy = np.mean([np.max(self.iou(boxes, clusters), axis=1)]) 41 | return accuracy 42 | 43 | def kmeans(self, boxes, k, dist=np.median): 44 | box_number = boxes.shape[0] 45 | distances = np.empty((box_number, k)) 46 | last_nearest = np.zeros((box_number,)) 47 | np.random.seed() 48 | clusters = boxes[np.random.choice( 49 | box_number, k, replace=False)] # init k clusters 50 | while True: 51 | 52 | distances = 1 - self.iou(boxes, clusters) 53 | 54 | current_nearest = np.argmin(distances, axis=1) 55 | if (last_nearest == current_nearest).all(): 56 | break # clusters won't change 57 | for cluster in range(k): 58 | clusters[cluster] = dist( # update clusters 59 | boxes[current_nearest == cluster], axis=0) 60 | 61 | last_nearest = current_nearest 62 | 63 | return clusters 64 | 65 | def result2txt(self, data): 66 | f = open("yolo_anchors.txt", 'w') 67 | row = np.shape(data)[0] 68 | for i in range(row): 69 | if i == 0: 70 | x_y = "%d,%d" % (data[i][0], data[i][1]) 71 | else: 72 | x_y = ", %d,%d" % (data[i][0], data[i][1]) 73 | f.write(x_y) 74 | f.close() 75 | 76 | def txt2boxes(self): 77 | f = open(self.filename, 'r') 78 | dataSet = [] 79 | for line in f: 80 | infos = line.split(" ") 81 | length = len(infos) 82 | for i in range(1, length): 83 | width = int(infos[i].split(",")[2]) - \ 84 | int(infos[i].split(",")[0]) 85 | height = int(infos[i].split(",")[3]) - \ 86 | int(infos[i].split(",")[1]) 87 | dataSet.append([width, height]) 88 | result = np.array(dataSet) 89 | f.close() 90 | return result 91 | 92 | def txt2clusters(self): 93 | all_boxes = self.txt2boxes() 94 | result = self.kmeans(all_boxes, k=self.cluster_number) 95 | result = result[np.lexsort(result.T[0, None])] 96 | self.result2txt(result) 97 | print("K anchors:\n {}".format(result)) 98 | print("Accuracy: {:.2f}%".format( 99 | self.avg_iou(all_boxes, result) * 100)) 100 | 101 | 102 | if __name__ == "__main__": 103 | cluster_number = 9 104 | filename = "2012_train.txt" 105 | kmeans = YOLO_Kmeans(cluster_number, filename) 106 | kmeans.txt2clusters() 107 | -------------------------------------------------------------------------------- /model_data/coco_classes.txt: -------------------------------------------------------------------------------- 1 | person 2 | bicycle 3 | car 4 | motorbike 5 | aeroplane 6 | bus 7 | train 8 | truck 9 | boat 10 | traffic light 11 | fire hydrant 12 | stop sign 13 | parking meter 14 | bench 15 | bird 16 | cat 17 | dog 18 | horse 19 | sheep 20 | cow 21 | elephant 22 | bear 23 | zebra 24 | giraffe 25 | backpack 26 | umbrella 27 | handbag 28 | tie 29 | suitcase 30 | frisbee 31 | skis 32 | snowboard 33 | sports ball 34 | kite 35 | baseball bat 36 | baseball glove 37 | skateboard 38 | surfboard 39 | tennis racket 40 | bottle 41 | wine glass 42 | cup 43 | fork 44 | knife 45 | spoon 46 | bowl 47 | banana 48 | apple 49 | sandwich 50 | orange 51 | broccoli 52 | carrot 53 | hot dog 54 | pizza 55 | donut 56 | cake 57 | chair 58 | sofa 59 | pottedplant 60 | bed 61 | diningtable 62 | toilet 63 | tvmonitor 64 | laptop 65 | mouse 66 | remote 67 | keyboard 68 | cell phone 69 | microwave 70 | oven 71 | toaster 72 | sink 73 | refrigerator 74 | book 75 | clock 76 | vase 77 | scissors 78 | teddy bear 79 | hair drier 80 | toothbrush 81 | -------------------------------------------------------------------------------- /model_data/tiny_yolo_anchors.txt: -------------------------------------------------------------------------------- 1 | 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 2 | -------------------------------------------------------------------------------- /model_data/voc_classes.txt: -------------------------------------------------------------------------------- 1 | aeroplane 2 | bicycle 3 | bird 4 | boat 5 | bottle 6 | bus 7 | car 8 | cat 9 | chair 10 | cow 11 | diningtable 12 | dog 13 | horse 14 | motorbike 15 | person 16 | pottedplant 17 | sheep 18 | sofa 19 | train 20 | tvmonitor 21 | -------------------------------------------------------------------------------- /model_data/yolo_anchors.txt: -------------------------------------------------------------------------------- 1 | 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 2 | -------------------------------------------------------------------------------- /pycocoEval.py: -------------------------------------------------------------------------------- 1 | #-*- coding:utf-8 -*- 2 | """ 3 | Created on April, 2019 4 | @authors: Hulking 5 | """ 6 | """ 7 | 计算mAP 8 | """ 9 | import matplotlib.pyplot as plt 10 | from pycocotools.coco import COCO 11 | from pycocotools.cocoeval import COCOeval 12 | import numpy as np 13 | import skimage.io as io 14 | import pylab,json 15 | pylab.rcParams['figure.figsize'] = (10.0, 8.0) 16 | def get_img_id(file_name): 17 | ls = [] 18 | myset = [] 19 | annos = json.load(open(file_name, 'r')) 20 | for anno in annos: 21 | ls.append(anno['image_id']) 22 | myset = {}.fromkeys(ls).keys() 23 | return myset 24 | def cal_coco_map(): 25 | annType = ['segm', 'bbox', 'keypoints'] 26 | annType = annType[1] 27 | cocoGt_file = '/home/common/datasets/coco/annotations/instances_val2014.json' 28 | cocoGt = COCO(cocoGt_file) 29 | cocoDt_file = './results/cocoapi_results.json' 30 | imgIds = get_img_id(cocoDt_file) 31 | print (len(imgIds)) 32 | cocoDt = cocoGt.loadRes(cocoDt_file) 33 | imgIds = sorted(imgIds) 34 | imgIds = imgIds[0:5000] 35 | cocoEval = COCOeval(cocoGt, cocoDt, annType) 36 | cocoEval.params.imgIds = imgIds 37 | cocoEval.evaluate() 38 | cocoEval.accumulate() 39 | cocoEval.summarize() 40 | 41 | if __name__ == '__main__': 42 | cal_coco_map() -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | """ 2 | @authors: Hulking 3 | April 2019 4 | """ 5 | 6 | import numpy as np 7 | import keras.backend as K 8 | from keras.layers import Input, Lambda 9 | from keras.models import Model 10 | from keras.optimizers import Adam 11 | from keras.callbacks import TensorBoard, ModelCheckpoint, ReduceLROnPlateau, EarlyStopping 12 | 13 | from yolo3.model import preprocess_true_boxes, yolo_body, tiny_yolo_body, yolo_loss 14 | from yolo3.utils import get_random_data 15 | 16 | """ 17 | 训练流程 18 | """ 19 | def _main(): 20 | annotation_path = 'train.txt' 21 | log_dir = 'logs/000/' 22 | classes_path = 'model_data/voc_classes.txt' 23 | anchors_path = 'model_data/yolo_anchors.txt' 24 | class_names = get_classes(classes_path) 25 | num_classes = len(class_names) 26 | anchors = get_anchors(anchors_path) 27 | 28 | input_shape = (416,416) # multiple of 32, hw 29 | 30 | is_tiny_version = len(anchors)==6 # default setting 31 | if is_tiny_version: 32 | model = create_tiny_model(input_shape, anchors, num_classes, 33 | freeze_body=2, weights_path='model_data/tiny_yolo_weights.h5') 34 | else: 35 | model = create_model(input_shape, anchors, num_classes, 36 | freeze_body=2, weights_path='model_data/yolo_weights.h5') # make sure you know what you freeze 37 | 38 | logging = TensorBoard(log_dir=log_dir) 39 | checkpoint = ModelCheckpoint(log_dir + 'ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5', 40 | monitor='val_loss', save_weights_only=True, save_best_only=True, period=3) 41 | reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, verbose=1) 42 | early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1) 43 | 44 | val_split = 0.1 45 | with open(annotation_path) as f: 46 | lines = f.readlines() 47 | np.random.seed(10101) 48 | np.random.shuffle(lines) 49 | np.random.seed(None) 50 | num_val = int(len(lines)*val_split) 51 | num_train = len(lines) - num_val 52 | 53 | # Train with frozen layers first, to get a stable loss. 54 | # Adjust num epochs to your dataset. This step is enough to obtain a not bad model. 55 | if True: 56 | model.compile(optimizer=Adam(lr=1e-3), loss={ 57 | # use custom yolo_loss Lambda layer. 58 | 'yolo_loss': lambda y_true, y_pred: y_pred}) 59 | 60 | batch_size = 32 61 | print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size)) 62 | model.fit_generator(data_generator_wrapper(lines[:num_train], batch_size, input_shape, anchors, num_classes), 63 | steps_per_epoch=max(1, num_train//batch_size), 64 | validation_data=data_generator_wrapper(lines[num_train:], batch_size, input_shape, anchors, num_classes), 65 | validation_steps=max(1, num_val//batch_size), 66 | epochs=50, 67 | initial_epoch=0, 68 | callbacks=[logging, checkpoint]) 69 | model.save_weights(log_dir + 'trained_weights_stage_1.h5') 70 | 71 | # Unfreeze and continue training, to fine-tune. 72 | # Train longer if the result is not good. 73 | if True: 74 | for i in range(len(model.layers)): 75 | model.layers[i].trainable = True 76 | model.compile(optimizer=Adam(lr=1e-4), loss={'yolo_loss': lambda y_true, y_pred: y_pred}) # recompile to apply the change 77 | print('Unfreeze all of the layers.') 78 | 79 | batch_size = 32 # note that more GPU memory is required after unfreezing the body 80 | print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size)) 81 | model.fit_generator(data_generator_wrapper(lines[:num_train], batch_size, input_shape, anchors, num_classes), 82 | steps_per_epoch=max(1, num_train//batch_size), 83 | validation_data=data_generator_wrapper(lines[num_train:], batch_size, input_shape, anchors, num_classes), 84 | validation_steps=max(1, num_val//batch_size), 85 | epochs=100, 86 | initial_epoch=50, 87 | callbacks=[logging, checkpoint, reduce_lr, early_stopping]) 88 | model.save_weights(log_dir + 'trained_weights_final.h5') 89 | 90 | # Further training if needed. 91 | 92 | """ 93 | 获取类别 94 | """ 95 | def get_classes(classes_path): 96 | '''loads the classes''' 97 | with open(classes_path) as f: 98 | class_names = f.readlines() 99 | class_names = [c.strip() for c in class_names] 100 | return class_names 101 | """ 102 | 获取锚点 103 | """ 104 | def get_anchors(anchors_path): 105 | '''loads the anchors from a file''' 106 | with open(anchors_path) as f: 107 | anchors = f.readline() 108 | anchors = [float(x) for x in anchors.split(',')] 109 | return np.array(anchors).reshape(-1, 2) 110 | 111 | """ 112 | 创建模型 113 | """ 114 | def create_model(input_shape, anchors, num_classes, load_pretrained=True, freeze_body=2, 115 | weights_path='model_data/yolo_weights.h5'): 116 | '''create the training model''' 117 | K.clear_session() # get a new session 118 | image_input = Input(shape=(None, None, 3)) 119 | h, w = input_shape 120 | num_anchors = len(anchors) 121 | 122 | y_true = [Input(shape=(h//{0:32, 1:16, 2:8}[l], w//{0:32, 1:16, 2:8}[l], \ 123 | num_anchors//3, num_classes+5)) for l in range(3)] 124 | 125 | model_body = yolo_body(image_input, num_anchors//3, num_classes) 126 | print('Create YOLOv3 model with {} anchors and {} classes.'.format(num_anchors, num_classes)) 127 | 128 | if load_pretrained: 129 | model_body.load_weights(weights_path, by_name=True, skip_mismatch=True) 130 | print('Load weights {}.'.format(weights_path)) 131 | if freeze_body in [1, 2]: 132 | # Freeze darknet53 body or freeze all but 3 output layers. 133 | num = (185, len(model_body.layers)-3)[freeze_body-1] 134 | for i in range(num): model_body.layers[i].trainable = False 135 | print('Freeze the first {} layers of total {} layers.'.format(num, len(model_body.layers))) 136 | 137 | model_loss = Lambda(yolo_loss, output_shape=(1,), name='yolo_loss', 138 | arguments={'anchors': anchors, 'num_classes': num_classes, 'ignore_thresh': 0.5})( 139 | [*model_body.output, *y_true]) 140 | model = Model([model_body.input, *y_true], model_loss) 141 | 142 | return model 143 | """ 144 | 创建阉割版模型 145 | """ 146 | def create_tiny_model(input_shape, anchors, num_classes, load_pretrained=True, freeze_body=2, 147 | weights_path='model_data/tiny_yolo_weights.h5'): 148 | '''create the training model, for Tiny YOLOv3''' 149 | K.clear_session() # get a new session 150 | image_input = Input(shape=(None, None, 3)) 151 | h, w = input_shape 152 | num_anchors = len(anchors) 153 | 154 | y_true = [Input(shape=(h//{0:32, 1:16}[l], w//{0:32, 1:16}[l], \ 155 | num_anchors//2, num_classes+5)) for l in range(2)] 156 | 157 | model_body = tiny_yolo_body(image_input, num_anchors//2, num_classes) 158 | print('Create Tiny YOLOv3 model with {} anchors and {} classes.'.format(num_anchors, num_classes)) 159 | 160 | if load_pretrained: 161 | model_body.load_weights(weights_path, by_name=True, skip_mismatch=True) 162 | print('Load weights {}.'.format(weights_path)) 163 | if freeze_body in [1, 2]: 164 | # Freeze the darknet body or freeze all but 2 output layers. 165 | num = (20, len(model_body.layers)-2)[freeze_body-1] 166 | for i in range(num): model_body.layers[i].trainable = False 167 | print('Freeze the first {} layers of total {} layers.'.format(num, len(model_body.layers))) 168 | 169 | model_loss = Lambda(yolo_loss, output_shape=(1,), name='yolo_loss', 170 | arguments={'anchors': anchors, 'num_classes': num_classes, 'ignore_thresh': 0.7})( 171 | [*model_body.output, *y_true]) 172 | model = Model([model_body.input, *y_true], model_loss) 173 | 174 | return model 175 | """ 176 | 真值数据生成 177 | """ 178 | def data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes): 179 | '''data generator for fit_generator''' 180 | n = len(annotation_lines) 181 | i = 0 182 | while True: 183 | image_data = [] 184 | box_data = [] 185 | for b in range(batch_size): 186 | if i==0: 187 | np.random.shuffle(annotation_lines) 188 | image, box = get_random_data(annotation_lines[i], input_shape, random=True) 189 | image_data.append(image) 190 | box_data.append(box) 191 | i = (i+1) % n 192 | image_data = np.array(image_data) 193 | box_data = np.array(box_data) 194 | y_true = preprocess_true_boxes(box_data, input_shape, anchors, num_classes) 195 | yield [image_data, *y_true], np.zeros(batch_size) 196 | 197 | def data_generator_wrapper(annotation_lines, batch_size, input_shape, anchors, num_classes): 198 | n = len(annotation_lines) 199 | if n==0 or batch_size<=0: return None 200 | return data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes) 201 | 202 | if __name__ == '__main__': 203 | _main() 204 | -------------------------------------------------------------------------------- /train_bottleneck.py: -------------------------------------------------------------------------------- 1 | """ 2 | Created on April, 2019 3 | @authors: Hulking 4 | """ 5 | """ 6 | Retrain the YOLO model for your own dataset. 7 | """ 8 | import os 9 | import numpy as np 10 | import keras.backend as K 11 | from keras.layers import Input, Lambda 12 | from keras.models import Model 13 | from keras.optimizers import Adam 14 | from keras.callbacks import TensorBoard, ModelCheckpoint, ReduceLROnPlateau, EarlyStopping 15 | 16 | from yolo3.model import preprocess_true_boxes, yolo_body, tiny_yolo_body, yolo_loss 17 | from yolo3.utils import get_random_data 18 | 19 | 20 | def _main(): 21 | annotation_path = 'train.txt' 22 | log_dir = 'logs/000/' 23 | classes_path = 'model_data/coco_classes.txt' 24 | anchors_path = 'model_data/yolo_anchors.txt' 25 | class_names = get_classes(classes_path) 26 | num_classes = len(class_names) 27 | anchors = get_anchors(anchors_path) 28 | 29 | input_shape = (416,416) # multiple of 32, hw 30 | 31 | model, bottleneck_model, last_layer_model = create_model(input_shape, anchors, num_classes, 32 | freeze_body=2, weights_path='model_data/yolo_weights.h5') # make sure you know what you freeze 33 | 34 | logging = TensorBoard(log_dir=log_dir) 35 | checkpoint = ModelCheckpoint(log_dir + 'ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5', 36 | monitor='val_loss', save_weights_only=True, save_best_only=True, period=3) 37 | reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, verbose=1) 38 | early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1) 39 | 40 | val_split = 0.1 41 | with open(annotation_path) as f: 42 | lines = f.readlines() 43 | np.random.seed(10101) 44 | np.random.shuffle(lines) 45 | np.random.seed(None) 46 | num_val = int(len(lines)*val_split) 47 | num_train = len(lines) - num_val 48 | 49 | # Train with frozen layers first, to get a stable loss. 50 | # Adjust num epochs to your dataset. This step is enough to obtain a not bad model. 51 | if True: 52 | # perform bottleneck training 53 | if not os.path.isfile("bottlenecks.npz"): 54 | print("calculating bottlenecks") 55 | batch_size=8 56 | bottlenecks=bottleneck_model.predict_generator(data_generator_wrapper(lines, batch_size, input_shape, anchors, num_classes, random=False, verbose=True), 57 | steps=(len(lines)//batch_size)+1, max_queue_size=1) 58 | np.savez("bottlenecks.npz", bot0=bottlenecks[0], bot1=bottlenecks[1], bot2=bottlenecks[2]) 59 | 60 | # load bottleneck features from file 61 | dict_bot=np.load("bottlenecks.npz") 62 | bottlenecks_train=[dict_bot["bot0"][:num_train], dict_bot["bot1"][:num_train], dict_bot["bot2"][:num_train]] 63 | bottlenecks_val=[dict_bot["bot0"][num_train:], dict_bot["bot1"][num_train:], dict_bot["bot2"][num_train:]] 64 | 65 | # train last layers with fixed bottleneck features 66 | batch_size=8 67 | print("Training last layers with bottleneck features") 68 | print('with {} samples, val on {} samples and batch size {}.'.format(num_train, num_val, batch_size)) 69 | last_layer_model.compile(optimizer='adam', loss={'yolo_loss': lambda y_true, y_pred: y_pred}) 70 | last_layer_model.fit_generator(bottleneck_generator(lines[:num_train], batch_size, input_shape, anchors, num_classes, bottlenecks_train), 71 | steps_per_epoch=max(1, num_train//batch_size), 72 | validation_data=bottleneck_generator(lines[num_train:], batch_size, input_shape, anchors, num_classes, bottlenecks_val), 73 | validation_steps=max(1, num_val//batch_size), 74 | epochs=30, 75 | initial_epoch=0, max_queue_size=1) 76 | model.save_weights(log_dir + 'trained_weights_stage_0.h5') 77 | 78 | # train last layers with random augmented data 79 | model.compile(optimizer=Adam(lr=1e-3), loss={ 80 | # use custom yolo_loss Lambda layer. 81 | 'yolo_loss': lambda y_true, y_pred: y_pred}) 82 | batch_size = 16 83 | print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size)) 84 | model.fit_generator(data_generator_wrapper(lines[:num_train], batch_size, input_shape, anchors, num_classes), 85 | steps_per_epoch=max(1, num_train//batch_size), 86 | validation_data=data_generator_wrapper(lines[num_train:], batch_size, input_shape, anchors, num_classes), 87 | validation_steps=max(1, num_val//batch_size), 88 | epochs=50, 89 | initial_epoch=0, 90 | callbacks=[logging, checkpoint]) 91 | model.save_weights(log_dir + 'trained_weights_stage_1.h5') 92 | 93 | # Unfreeze and continue training, to fine-tune. 94 | # Train longer if the result is not good. 95 | if True: 96 | for i in range(len(model.layers)): 97 | model.layers[i].trainable = True 98 | model.compile(optimizer=Adam(lr=1e-4), loss={'yolo_loss': lambda y_true, y_pred: y_pred}) # recompile to apply the change 99 | print('Unfreeze all of the layers.') 100 | 101 | batch_size = 4 # note that more GPU memory is required after unfreezing the body 102 | print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size)) 103 | model.fit_generator(data_generator_wrapper(lines[:num_train], batch_size, input_shape, anchors, num_classes), 104 | steps_per_epoch=max(1, num_train//batch_size), 105 | validation_data=data_generator_wrapper(lines[num_train:], batch_size, input_shape, anchors, num_classes), 106 | validation_steps=max(1, num_val//batch_size), 107 | epochs=100, 108 | initial_epoch=50, 109 | callbacks=[logging, checkpoint, reduce_lr, early_stopping]) 110 | model.save_weights(log_dir + 'trained_weights_final.h5') 111 | 112 | # Further training if needed. 113 | 114 | 115 | def get_classes(classes_path): 116 | '''loads the classes''' 117 | with open(classes_path) as f: 118 | class_names = f.readlines() 119 | class_names = [c.strip() for c in class_names] 120 | return class_names 121 | 122 | def get_anchors(anchors_path): 123 | '''loads the anchors from a file''' 124 | with open(anchors_path) as f: 125 | anchors = f.readline() 126 | anchors = [float(x) for x in anchors.split(',')] 127 | return np.array(anchors).reshape(-1, 2) 128 | 129 | 130 | def create_model(input_shape, anchors, num_classes, load_pretrained=True, freeze_body=2, 131 | weights_path='model_data/yolo_weights.h5'): 132 | '''create the training model''' 133 | K.clear_session() # get a new session 134 | image_input = Input(shape=(None, None, 3)) 135 | h, w = input_shape 136 | num_anchors = len(anchors) 137 | 138 | y_true = [Input(shape=(h//{0:32, 1:16, 2:8}[l], w//{0:32, 1:16, 2:8}[l], \ 139 | num_anchors//3, num_classes+5)) for l in range(3)] 140 | 141 | model_body = yolo_body(image_input, num_anchors//3, num_classes) 142 | print('Create YOLOv3 model with {} anchors and {} classes.'.format(num_anchors, num_classes)) 143 | 144 | if load_pretrained: 145 | model_body.load_weights(weights_path, by_name=True, skip_mismatch=True) 146 | print('Load weights {}.'.format(weights_path)) 147 | if freeze_body in [1, 2]: 148 | # Freeze darknet53 body or freeze all but 3 output layers. 149 | num = (185, len(model_body.layers)-3)[freeze_body-1] 150 | for i in range(num): model_body.layers[i].trainable = False 151 | print('Freeze the first {} layers of total {} layers.'.format(num, len(model_body.layers))) 152 | 153 | # get output of second last layers and create bottleneck model of it 154 | out1=model_body.layers[246].output 155 | out2=model_body.layers[247].output 156 | out3=model_body.layers[248].output 157 | bottleneck_model = Model([model_body.input, *y_true], [out1, out2, out3]) 158 | 159 | # create last layer model of last layers from yolo model 160 | in0 = Input(shape=bottleneck_model.output[0].shape[1:].as_list()) 161 | in1 = Input(shape=bottleneck_model.output[1].shape[1:].as_list()) 162 | in2 = Input(shape=bottleneck_model.output[2].shape[1:].as_list()) 163 | last_out0=model_body.layers[249](in0) 164 | last_out1=model_body.layers[250](in1) 165 | last_out2=model_body.layers[251](in2) 166 | model_last=Model(inputs=[in0, in1, in2], outputs=[last_out0, last_out1, last_out2]) 167 | model_loss_last =Lambda(yolo_loss, output_shape=(1,), name='yolo_loss', 168 | arguments={'anchors': anchors, 'num_classes': num_classes, 'ignore_thresh': 0.5})( 169 | [*model_last.output, *y_true]) 170 | last_layer_model = Model([in0,in1,in2, *y_true], model_loss_last) 171 | 172 | 173 | model_loss = Lambda(yolo_loss, output_shape=(1,), name='yolo_loss', 174 | arguments={'anchors': anchors, 'num_classes': num_classes, 'ignore_thresh': 0.5})( 175 | [*model_body.output, *y_true]) 176 | model = Model([model_body.input, *y_true], model_loss) 177 | 178 | return model, bottleneck_model, last_layer_model 179 | 180 | def data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes, random=True, verbose=False): 181 | '''data generator for fit_generator''' 182 | n = len(annotation_lines) 183 | i = 0 184 | while True: 185 | image_data = [] 186 | box_data = [] 187 | for b in range(batch_size): 188 | if i==0 and random: 189 | np.random.shuffle(annotation_lines) 190 | image, box = get_random_data(annotation_lines[i], input_shape, random=random) 191 | image_data.append(image) 192 | box_data.append(box) 193 | i = (i+1) % n 194 | image_data = np.array(image_data) 195 | if verbose: 196 | print("Progress: ",i,"/",n) 197 | box_data = np.array(box_data) 198 | y_true = preprocess_true_boxes(box_data, input_shape, anchors, num_classes) 199 | yield [image_data, *y_true], np.zeros(batch_size) 200 | 201 | def data_generator_wrapper(annotation_lines, batch_size, input_shape, anchors, num_classes, random=True, verbose=False): 202 | n = len(annotation_lines) 203 | if n==0 or batch_size<=0: return None 204 | return data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes, random, verbose) 205 | 206 | def bottleneck_generator(annotation_lines, batch_size, input_shape, anchors, num_classes, bottlenecks): 207 | n = len(annotation_lines) 208 | i = 0 209 | while True: 210 | box_data = [] 211 | b0=np.zeros((batch_size,bottlenecks[0].shape[1],bottlenecks[0].shape[2],bottlenecks[0].shape[3])) 212 | b1=np.zeros((batch_size,bottlenecks[1].shape[1],bottlenecks[1].shape[2],bottlenecks[1].shape[3])) 213 | b2=np.zeros((batch_size,bottlenecks[2].shape[1],bottlenecks[2].shape[2],bottlenecks[2].shape[3])) 214 | for b in range(batch_size): 215 | _, box = get_random_data(annotation_lines[i], input_shape, random=False, proc_img=False) 216 | box_data.append(box) 217 | b0[b]=bottlenecks[0][i] 218 | b1[b]=bottlenecks[1][i] 219 | b2[b]=bottlenecks[2][i] 220 | i = (i+1) % n 221 | box_data = np.array(box_data) 222 | y_true = preprocess_true_boxes(box_data, input_shape, anchors, num_classes) 223 | yield [b0, b1, b2, *y_true], np.zeros(batch_size) 224 | 225 | if __name__ == '__main__': 226 | _main() 227 | -------------------------------------------------------------------------------- /voc_annotation.py: -------------------------------------------------------------------------------- 1 | """ 2 | Created on April, 2019 3 | @authors: Hulking 4 | """ 5 | import xml.etree.ElementTree as ET 6 | from os import getcwd 7 | 8 | sets=[('2007', 'train'), ('2007', 'val'), ('2007', 'test')] 9 | 10 | classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"] 11 | 12 | 13 | def convert_annotation(year, image_id, list_file): 14 | in_file = open('VOCdevkit/VOC%s/Annotations/%s.xml'%(year, image_id)) 15 | tree=ET.parse(in_file) 16 | root = tree.getroot() 17 | 18 | for obj in root.iter('object'): 19 | difficult = obj.find('difficult').text 20 | cls = obj.find('name').text 21 | if cls not in classes or int(difficult)==1: 22 | continue 23 | cls_id = classes.index(cls) 24 | xmlbox = obj.find('bndbox') 25 | b = (int(xmlbox.find('xmin').text), int(xmlbox.find('ymin').text), int(xmlbox.find('xmax').text), int(xmlbox.find('ymax').text)) 26 | list_file.write(" " + ",".join([str(a) for a in b]) + ',' + str(cls_id)) 27 | 28 | wd = getcwd() 29 | 30 | for year, image_set in sets: 31 | image_ids = open('VOCdevkit/VOC%s/ImageSets/Main/%s.txt'%(year, image_set)).read().strip().split() 32 | list_file = open('%s_%s.txt'%(year, image_set), 'w') 33 | for image_id in image_ids: 34 | list_file.write('%s/VOCdevkit/VOC%s/JPEGImages/%s.jpg'%(wd, year, image_id)) 35 | convert_annotation(year, image_id, list_file) 36 | list_file.write('\n') 37 | list_file.close() 38 | 39 | -------------------------------------------------------------------------------- /yolo.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on April, 2019 4 | @authors: Hulking 5 | """ 6 | import colorsys 7 | import os 8 | from timeit import default_timer as timer 9 | 10 | import numpy as np 11 | from keras import backend as K 12 | from keras.models import load_model 13 | from keras.layers import Input 14 | from PIL import Image, ImageFont, ImageDraw 15 | 16 | from yolo3.model import yolo_eval, yolo_body, tiny_yolo_body 17 | from yolo3.utils import letterbox_image 18 | import os 19 | from keras.utils import multi_gpu_model 20 | 21 | """ 22 | YOLO类 包含了针对模型的基本操作 23 | """ 24 | class YOLO(object): 25 | _defaults = { 26 | "model_path": 'model_data/yolo.h5', 27 | "anchors_path": 'model_data/yolo_anchors.txt', 28 | "classes_path": 'model_data/coco_classes.txt', 29 | "score" : 0.001, 30 | "iou" : 0.5, 31 | "model_image_size" : (416, 416), 32 | "gpu_num" : 4, 33 | } 34 | 35 | @classmethod 36 | def get_defaults(cls, n): 37 | if n in cls._defaults: 38 | return cls._defaults[n] 39 | else: 40 | return "Unrecognized attribute name '" + n + "'" 41 | 42 | def __init__(self, **kwargs): 43 | self.__dict__.update(self._defaults) # set up default values 44 | self.__dict__.update(kwargs) # and update with user overrides 45 | self.class_names = self._get_class() 46 | self.anchors = self._get_anchors() 47 | self.sess = K.get_session() 48 | self.boxes, self.scores, self.classes = self.generate() 49 | 50 | def _get_class(self): 51 | classes_path = os.path.expanduser(self.classes_path) 52 | with open(classes_path) as f: 53 | class_names = f.readlines() 54 | class_names = [c.strip() for c in class_names] 55 | return class_names 56 | 57 | def _get_anchors(self): 58 | anchors_path = os.path.expanduser(self.anchors_path) 59 | with open(anchors_path) as f: 60 | anchors = f.readline() 61 | anchors = [float(x) for x in anchors.split(',')] 62 | return np.array(anchors).reshape(-1, 2) 63 | 64 | def generate(self): 65 | model_path = os.path.expanduser(self.model_path) 66 | assert model_path.endswith('.h5'), 'Keras model or weights must be a .h5 file.' 67 | 68 | # Load model, or construct model and load weights. 69 | num_anchors = len(self.anchors) 70 | num_classes = len(self.class_names) 71 | is_tiny_version = num_anchors==6 # default setting 72 | try: 73 | self.yolo_model = load_model(model_path, compile=False) 74 | except: 75 | self.yolo_model = tiny_yolo_body(Input(shape=(None,None,3)), num_anchors//2, num_classes) \ 76 | if is_tiny_version else yolo_body(Input(shape=(None,None,3)), num_anchors//3, num_classes) 77 | self.yolo_model.load_weights(self.model_path) # make sure model, anchors and classes match 78 | else: 79 | assert self.yolo_model.layers[-1].output_shape[-1] == \ 80 | num_anchors/len(self.yolo_model.output) * (num_classes + 5), \ 81 | 'Mismatch between model and given anchor and class sizes' 82 | 83 | print('{} model, anchors, and classes loaded.'.format(model_path)) 84 | 85 | # Generate colors for drawing bounding boxes. 86 | hsv_tuples = [(x / len(self.class_names), 1., 1.) 87 | for x in range(len(self.class_names))] 88 | self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples)) 89 | self.colors = list( 90 | map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), 91 | self.colors)) 92 | np.random.seed(10101) # Fixed seed for consistent colors across runs. 93 | np.random.shuffle(self.colors) # Shuffle colors to decorrelate adjacent classes. 94 | np.random.seed(None) # Reset seed to default. 95 | 96 | # Generate output tensor targets for filtered bounding boxes. 97 | self.input_image_shape = K.placeholder(shape=(2, )) 98 | if self.gpu_num>=2: 99 | self.yolo_model = multi_gpu_model(self.yolo_model, gpus=self.gpu_num) 100 | boxes, scores, classes = yolo_eval(self.yolo_model.output, self.anchors, 101 | len(self.class_names), self.input_image_shape, 102 | score_threshold=self.score, iou_threshold=self.iou) 103 | return boxes, scores, classes 104 | 105 | """ 106 | 单张图片的预测: 107 | 输入:单张图片 108 | 返回:绘制了检测框 类别 概率的图片 109 | """ 110 | def detect_image(self, image): 111 | start = timer() 112 | 113 | if self.model_image_size != (None, None): 114 | assert self.model_image_size[0]%32 == 0, 'Multiples of 32 required' 115 | assert self.model_image_size[1]%32 == 0, 'Multiples of 32 required' 116 | boxed_image = letterbox_image(image, tuple(reversed(self.model_image_size))) 117 | else: 118 | new_image_size = (image.width - (image.width % 32), 119 | image.height - (image.height % 32)) 120 | boxed_image = letterbox_image(image, new_image_size) 121 | image_data = np.array(boxed_image, dtype='float32') 122 | 123 | print(image_data.shape) 124 | image_data /= 255. 125 | image_data = np.expand_dims(image_data, 0) # Add batch dimension. 126 | 127 | out_boxes, out_scores, out_classes = self.sess.run( 128 | [self.boxes, self.scores, self.classes], 129 | feed_dict={ 130 | self.yolo_model.input: image_data, 131 | self.input_image_shape: [image.size[1], image.size[0]], 132 | K.learning_phase(): 0 133 | }) 134 | 135 | print('Found {} boxes for {}'.format(len(out_boxes), 'img')) 136 | 137 | font = ImageFont.truetype(font='font/FiraMono-Medium.otf', 138 | size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32')) 139 | thickness = (image.size[0] + image.size[1]) // 300 140 | 141 | for i, c in reversed(list(enumerate(out_classes))): 142 | predicted_class = self.class_names[c] 143 | box = out_boxes[i] 144 | score = out_scores[i] 145 | 146 | label = '{} {:.2f}'.format(predicted_class, score) 147 | draw = ImageDraw.Draw(image) 148 | label_size = draw.textsize(label, font) 149 | 150 | top, left, bottom, right = box 151 | top = max(0, np.floor(top + 0.5).astype('int32')) 152 | left = max(0, np.floor(left + 0.5).astype('int32')) 153 | bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32')) 154 | right = min(image.size[0], np.floor(right + 0.5).astype('int32')) 155 | print(label, (left, top), (right, bottom)) 156 | 157 | if top - label_size[1] >= 0: 158 | text_origin = np.array([left, top - label_size[1]]) 159 | else: 160 | text_origin = np.array([left, top + 1]) 161 | 162 | # My kingdom for a good redistributable image drawing library. 163 | for i in range(thickness): 164 | draw.rectangle( 165 | [left + i, top + i, right - i, bottom - i], 166 | outline=self.colors[c]) 167 | draw.rectangle( 168 | [tuple(text_origin), tuple(text_origin + label_size)], 169 | fill=self.colors[c]) 170 | draw.text(text_origin, label, fill=(0, 0, 0), font=font) 171 | del draw 172 | 173 | end = timer() 174 | print(end - start) 175 | return image 176 | 177 | 178 | """ 179 | 单张图片的预测: 180 | 输入:单张图片 181 | 返回:检测框左上角和右下角的点坐标 分数 类别 182 | """ 183 | def valid_image(self, image): 184 | start = timer() 185 | 186 | if self.model_image_size != (None, None): 187 | assert self.model_image_size[0]%32 == 0, 'Multiples of 32 required' 188 | assert self.model_image_size[1]%32 == 0, 'Multiples of 32 required' 189 | boxed_image = letterbox_image(image, tuple(reversed(self.model_image_size))) 190 | else: 191 | new_image_size = (image.width - (image.width % 32), 192 | image.height - (image.height % 32)) 193 | boxed_image = letterbox_image(image, new_image_size) 194 | image_data = np.array(boxed_image, dtype='float32') 195 | 196 | # print(image_data.shape) 197 | image_data /= 255. 198 | image_data = np.expand_dims(image_data, 0) # Add batch dimension. 199 | 200 | out_boxes, out_scores, out_classes = self.sess.run( 201 | [self.boxes, self.scores, self.classes], 202 | feed_dict={ 203 | self.yolo_model.input: image_data, 204 | self.input_image_shape: [image.size[1], image.size[0]], 205 | K.learning_phase(): 0 206 | }) 207 | 208 | end = timer() 209 | print("time:",end - start) 210 | return out_boxes, out_scores, out_classes 211 | 212 | def close_session(self): 213 | self.sess.close() 214 | 215 | """ 216 | 视频预测: 217 | 输入:视频路径 218 | 返回:绘制了预测结果的视频 219 | """ 220 | def detect_video(yolo, video_path, output_path=""): 221 | import cv2 222 | vid = cv2.VideoCapture(video_path) 223 | if not vid.isOpened(): 224 | raise IOError("Couldn't open webcam or video") 225 | video_FourCC = int(vid.get(cv2.CAP_PROP_FOURCC)) 226 | video_fps = vid.get(cv2.CAP_PROP_FPS) 227 | video_size = (int(vid.get(cv2.CAP_PROP_FRAME_WIDTH)), 228 | int(vid.get(cv2.CAP_PROP_FRAME_HEIGHT))) 229 | isOutput = True if output_path != "" else False 230 | if isOutput: 231 | print("!!! TYPE:", type(output_path), type(video_FourCC), type(video_fps), type(video_size)) 232 | out = cv2.VideoWriter(output_path, video_FourCC, video_fps, video_size) 233 | accum_time = 0 234 | curr_fps = 0 235 | fps = "FPS: ??" 236 | prev_time = timer() 237 | while True: 238 | return_value, frame = vid.read() 239 | image = Image.fromarray(frame) 240 | image = yolo.detect_image(image) 241 | result = np.asarray(image) 242 | curr_time = timer() 243 | exec_time = curr_time - prev_time 244 | prev_time = curr_time 245 | accum_time = accum_time + exec_time 246 | curr_fps = curr_fps + 1 247 | if accum_time > 1: 248 | accum_time = accum_time - 1 249 | fps = "FPS: " + str(curr_fps) 250 | curr_fps = 0 251 | cv2.putText(result, text=fps, org=(3, 15), fontFace=cv2.FONT_HERSHEY_SIMPLEX, 252 | fontScale=0.50, color=(255, 0, 0), thickness=2) 253 | cv2.namedWindow("result", cv2.WINDOW_NORMAL) 254 | cv2.imshow("result", result) 255 | if isOutput: 256 | out.write(result) 257 | if cv2.waitKey(1) & 0xFF == ord('q'): 258 | break 259 | yolo.close_session() 260 | 261 | -------------------------------------------------------------------------------- /yolo3/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HulkMaker/tensorflow-keras-yolov3/04a873529e9941a576e7058e47f8991d188ba15b/yolo3/__init__.py -------------------------------------------------------------------------------- /yolo3/model.py: -------------------------------------------------------------------------------- 1 | """ 2 | Created on April, 2019 3 | @authors: Hulking 4 | """ 5 | """YOLO_v3 Model Defined in Keras.""" 6 | 7 | from functools import wraps 8 | 9 | import numpy as np 10 | import tensorflow as tf 11 | from keras import backend as K 12 | from keras.layers import Conv2D, Add, ZeroPadding2D, UpSampling2D, Concatenate, MaxPooling2D 13 | from keras.layers.advanced_activations import LeakyReLU 14 | from keras.layers.normalization import BatchNormalization 15 | from keras.models import Model 16 | from keras.regularizers import l2 17 | 18 | from yolo3.utils import compose 19 | 20 | """ 21 | 模型的组件 22 | """ 23 | @wraps(Conv2D) 24 | def DarknetConv2D(*args, **kwargs): 25 | """Wrapper to set Darknet parameters for Convolution2D.""" 26 | darknet_conv_kwargs = {'kernel_regularizer': l2(5e-4)} 27 | darknet_conv_kwargs['padding'] = 'valid' if kwargs.get('strides')==(2,2) else 'same' 28 | darknet_conv_kwargs.update(kwargs) 29 | return Conv2D(*args, **darknet_conv_kwargs) 30 | 31 | def DarknetConv2D_BN_Leaky(*args, **kwargs): 32 | """Darknet Convolution2D followed by BatchNormalization and LeakyReLU.""" 33 | no_bias_kwargs = {'use_bias': False} 34 | no_bias_kwargs.update(kwargs) 35 | return compose( 36 | DarknetConv2D(*args, **no_bias_kwargs), 37 | BatchNormalization(), 38 | LeakyReLU(alpha=0.1)) 39 | """ 40 | 残差 41 | """ 42 | def resblock_body(x, num_filters, num_blocks): 43 | '''A series of resblocks starting with a downsampling Convolution2D''' 44 | # Darknet uses left and top padding instead of 'same' mode 45 | x = ZeroPadding2D(((1,0),(1,0)))(x) 46 | x = DarknetConv2D_BN_Leaky(num_filters, (3,3), strides=(2,2))(x) 47 | for i in range(num_blocks): 48 | y = compose( 49 | DarknetConv2D_BN_Leaky(num_filters//2, (1,1)), 50 | DarknetConv2D_BN_Leaky(num_filters, (3,3)))(x) 51 | x = Add()([x,y]) 52 | return x 53 | """ 54 | 骨干网络darknet 55 | """ 56 | def darknet_body(x): 57 | '''Darknent body having 52 Convolution2D layers''' 58 | x = DarknetConv2D_BN_Leaky(32, (3,3))(x) 59 | x = resblock_body(x, 64, 1) 60 | x = resblock_body(x, 128, 2) 61 | x = resblock_body(x, 256, 8) 62 | x = resblock_body(x, 512, 8) 63 | x = resblock_body(x, 1024, 4) 64 | return x 65 | """ 66 | YOLO主体的组件 67 | """ 68 | def make_last_layers(x, num_filters, out_filters): 69 | '''6 Conv2D_BN_Leaky layers followed by a Conv2D_linear layer''' 70 | x = compose( 71 | DarknetConv2D_BN_Leaky(num_filters, (1,1)), 72 | DarknetConv2D_BN_Leaky(num_filters*2, (3,3)), 73 | DarknetConv2D_BN_Leaky(num_filters, (1,1)), 74 | DarknetConv2D_BN_Leaky(num_filters*2, (3,3)), 75 | DarknetConv2D_BN_Leaky(num_filters, (1,1)))(x) 76 | y = compose( 77 | DarknetConv2D_BN_Leaky(num_filters*2, (3,3)), 78 | DarknetConv2D(out_filters, (1,1)))(x) 79 | return x, y 80 | 81 | """ 82 | YOLO主体 83 | """ 84 | def yolo_body(inputs, num_anchors, num_classes): 85 | """Create YOLO_V3 model CNN body in Keras.""" 86 | darknet = Model(inputs, darknet_body(inputs)) 87 | x, y1 = make_last_layers(darknet.output, 512, num_anchors*(num_classes+5)) 88 | 89 | x = compose( 90 | DarknetConv2D_BN_Leaky(256, (1,1)), 91 | UpSampling2D(2))(x) 92 | x = Concatenate()([x,darknet.layers[152].output]) 93 | x, y2 = make_last_layers(x, 256, num_anchors*(num_classes+5)) 94 | 95 | x = compose( 96 | DarknetConv2D_BN_Leaky(128, (1,1)), 97 | UpSampling2D(2))(x) 98 | x = Concatenate()([x,darknet.layers[92].output]) 99 | x, y3 = make_last_layers(x, 128, num_anchors*(num_classes+5)) 100 | 101 | return Model(inputs, [y1,y2,y3]) 102 | """ 103 | 阉割版的YOLO 104 | """ 105 | def tiny_yolo_body(inputs, num_anchors, num_classes): 106 | '''Create Tiny YOLO_v3 model CNN body in keras.''' 107 | x1 = compose( 108 | DarknetConv2D_BN_Leaky(16, (3,3)), 109 | MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'), 110 | DarknetConv2D_BN_Leaky(32, (3,3)), 111 | MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'), 112 | DarknetConv2D_BN_Leaky(64, (3,3)), 113 | MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'), 114 | DarknetConv2D_BN_Leaky(128, (3,3)), 115 | MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'), 116 | DarknetConv2D_BN_Leaky(256, (3,3)))(inputs) 117 | x2 = compose( 118 | MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'), 119 | DarknetConv2D_BN_Leaky(512, (3,3)), 120 | MaxPooling2D(pool_size=(2,2), strides=(1,1), padding='same'), 121 | DarknetConv2D_BN_Leaky(1024, (3,3)), 122 | DarknetConv2D_BN_Leaky(256, (1,1)))(x1) 123 | y1 = compose( 124 | DarknetConv2D_BN_Leaky(512, (3,3)), 125 | DarknetConv2D(num_anchors*(num_classes+5), (1,1)))(x2) 126 | 127 | x2 = compose( 128 | DarknetConv2D_BN_Leaky(128, (1,1)), 129 | UpSampling2D(2))(x2) 130 | y2 = compose( 131 | Concatenate(), 132 | DarknetConv2D_BN_Leaky(256, (3,3)), 133 | DarknetConv2D(num_anchors*(num_classes+5), (1,1)))([x2,x1]) 134 | 135 | return Model(inputs, [y1,y2]) 136 | 137 | """ 138 | 将yolo曾输出格式进行转换,便于进行eval 139 | """ 140 | def yolo_head(feats, anchors, num_classes, input_shape, calc_loss=False): 141 | """Convert final layer features to bounding box parameters.""" 142 | num_anchors = len(anchors) 143 | # Reshape to batch, height, width, num_anchors, box_params. 144 | anchors_tensor = K.reshape(K.constant(anchors), [1, 1, 1, num_anchors, 2]) 145 | 146 | grid_shape = K.shape(feats)[1:3] # height, width 147 | grid_y = K.tile(K.reshape(K.arange(0, stop=grid_shape[0]), [-1, 1, 1, 1]), 148 | [1, grid_shape[1], 1, 1]) 149 | grid_x = K.tile(K.reshape(K.arange(0, stop=grid_shape[1]), [1, -1, 1, 1]), 150 | [grid_shape[0], 1, 1, 1]) 151 | grid = K.concatenate([grid_x, grid_y]) 152 | grid = K.cast(grid, K.dtype(feats)) 153 | 154 | feats = K.reshape( 155 | feats, [-1, grid_shape[0], grid_shape[1], num_anchors, num_classes + 5]) 156 | 157 | # Adjust preditions to each spatial grid point and anchor size. 158 | box_xy = (K.sigmoid(feats[..., :2]) + grid) / K.cast(grid_shape[::-1], K.dtype(feats)) 159 | box_wh = K.exp(feats[..., 2:4]) * anchors_tensor / K.cast(input_shape[::-1], K.dtype(feats)) 160 | box_confidence = K.sigmoid(feats[..., 4:5]) 161 | box_class_probs = K.sigmoid(feats[..., 5:]) 162 | 163 | if calc_loss == True: 164 | return grid, feats, box_xy, box_wh 165 | return box_xy, box_wh, box_confidence, box_class_probs 166 | 167 | """ 168 | 检测框形式由 左上角点+宽高 转换为 左上角点+右下角点 169 | """ 170 | def yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape): 171 | '''Get corrected boxes''' 172 | box_yx = box_xy[..., ::-1] 173 | box_hw = box_wh[..., ::-1] 174 | input_shape = K.cast(input_shape, K.dtype(box_yx)) 175 | image_shape = K.cast(image_shape, K.dtype(box_yx)) 176 | new_shape = K.round(image_shape * K.min(input_shape/image_shape)) 177 | offset = (input_shape-new_shape)/2./input_shape 178 | scale = input_shape/new_shape 179 | box_yx = (box_yx - offset) * scale 180 | box_hw *= scale 181 | 182 | box_mins = box_yx - (box_hw / 2.) 183 | box_maxes = box_yx + (box_hw / 2.) 184 | boxes = K.concatenate([ 185 | box_mins[..., 0:1], # y_min 186 | box_mins[..., 1:2], # x_min 187 | box_maxes[..., 0:1], # y_max 188 | box_maxes[..., 1:2] # x_max 189 | ]) 190 | 191 | # Scale boxes back to original image shape. 192 | boxes *= K.concatenate([image_shape, image_shape]) 193 | return boxes 194 | 195 | """ 196 | 检测框和类别分数计算 197 | """ 198 | def yolo_boxes_and_scores(feats, anchors, num_classes, input_shape, image_shape): 199 | '''Process Conv layer output''' 200 | box_xy, box_wh, box_confidence, box_class_probs = yolo_head(feats, 201 | anchors, num_classes, input_shape) 202 | boxes = yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape) 203 | boxes = K.reshape(boxes, [-1, 4]) 204 | box_scores = box_confidence * box_class_probs 205 | box_scores = K.reshape(box_scores, [-1, num_classes]) 206 | return boxes, box_scores 207 | 208 | """ 209 | 将yolo层三个尺度的输出结果转换成检测框+分数+类别的形式 210 | """ 211 | def yolo_eval(yolo_outputs, 212 | anchors, 213 | num_classes, 214 | image_shape, 215 | max_boxes=20, 216 | score_threshold=.6, 217 | iou_threshold=.5): 218 | """Evaluate YOLO model on given input and return filtered boxes.""" 219 | num_layers = len(yolo_outputs) 220 | anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]] # default setting 221 | input_shape = K.shape(yolo_outputs[0])[1:3] * 32 222 | boxes = [] 223 | box_scores = [] 224 | for l in range(num_layers): 225 | _boxes, _box_scores = yolo_boxes_and_scores(yolo_outputs[l], 226 | anchors[anchor_mask[l]], num_classes, input_shape, image_shape) 227 | boxes.append(_boxes) 228 | box_scores.append(_box_scores) 229 | boxes = K.concatenate(boxes, axis=0) 230 | box_scores = K.concatenate(box_scores, axis=0) 231 | 232 | mask = box_scores >= score_threshold 233 | max_boxes_tensor = K.constant(max_boxes, dtype='int32') 234 | boxes_ = [] 235 | scores_ = [] 236 | classes_ = [] 237 | for c in range(num_classes): 238 | # TODO: use keras backend instead of tf. 239 | class_boxes = tf.boolean_mask(boxes, mask[:, c]) 240 | class_box_scores = tf.boolean_mask(box_scores[:, c], mask[:, c]) 241 | nms_index = tf.image.non_max_suppression( 242 | class_boxes, class_box_scores, max_boxes_tensor, iou_threshold=iou_threshold) 243 | class_boxes = K.gather(class_boxes, nms_index) 244 | class_box_scores = K.gather(class_box_scores, nms_index) 245 | classes = K.ones_like(class_box_scores, 'int32') * c 246 | boxes_.append(class_boxes) 247 | scores_.append(class_box_scores) 248 | classes_.append(classes) 249 | boxes_ = K.concatenate(boxes_, axis=0) 250 | scores_ = K.concatenate(scores_, axis=0) 251 | classes_ = K.concatenate(classes_, axis=0) 252 | 253 | return boxes_, scores_, classes_ 254 | 255 | """ 256 | 预处理真值的检测框 257 | """ 258 | def preprocess_true_boxes(true_boxes, input_shape, anchors, num_classes): 259 | '''Preprocess true boxes to training input format 260 | 261 | Parameters 262 | ---------- 263 | true_boxes: array, shape=(m, T, 5) 264 | Absolute x_min, y_min, x_max, y_max, class_id relative to input_shape. 265 | input_shape: array-like, hw, multiples of 32 266 | anchors: array, shape=(N, 2), wh 267 | num_classes: integer 268 | 269 | Returns 270 | ------- 271 | y_true: list of array, shape like yolo_outputs, xywh are reletive value 272 | 273 | ''' 274 | assert (true_boxes[..., 4]0 295 | 296 | for b in range(m): 297 | # Discard zero rows. 298 | wh = boxes_wh[b, valid_mask[b]] 299 | if len(wh)==0: continue 300 | # Expand dim to apply broadcasting. 301 | wh = np.expand_dims(wh, -2) 302 | box_maxes = wh / 2. 303 | box_mins = -box_maxes 304 | 305 | intersect_mins = np.maximum(box_mins, anchor_mins) 306 | intersect_maxes = np.minimum(box_maxes, anchor_maxes) 307 | intersect_wh = np.maximum(intersect_maxes - intersect_mins, 0.) 308 | intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1] 309 | box_area = wh[..., 0] * wh[..., 1] 310 | anchor_area = anchors[..., 0] * anchors[..., 1] 311 | iou = intersect_area / (box_area + anchor_area - intersect_area) 312 | 313 | # Find best anchor for each true box 314 | best_anchor = np.argmax(iou, axis=-1) 315 | 316 | for t, n in enumerate(best_anchor): 317 | for l in range(num_layers): 318 | if n in anchor_mask[l]: 319 | i = np.floor(true_boxes[b,t,0]*grid_shapes[l][1]).astype('int32') 320 | j = np.floor(true_boxes[b,t,1]*grid_shapes[l][0]).astype('int32') 321 | k = anchor_mask[l].index(n) 322 | c = true_boxes[b,t, 4].astype('int32') 323 | y_true[l][b, j, i, k, 0:4] = true_boxes[b,t, 0:4] 324 | y_true[l][b, j, i, k, 4] = 1 325 | y_true[l][b, j, i, k, 5+c] = 1 326 | 327 | return y_true 328 | 329 | """ 330 | 计算检测框之间的交叠比 331 | """ 332 | def box_iou(b1, b2): 333 | '''Return iou tensor 334 | 335 | Parameters 336 | ---------- 337 | b1: tensor, shape=(i1,...,iN, 4), xywh 338 | b2: tensor, shape=(j, 4), xywh 339 | 340 | Returns 341 | ------- 342 | iou: tensor, shape=(i1,...,iN, j) 343 | 344 | ''' 345 | 346 | # Expand dim to apply broadcasting. 347 | b1 = K.expand_dims(b1, -2) 348 | b1_xy = b1[..., :2] 349 | b1_wh = b1[..., 2:4] 350 | b1_wh_half = b1_wh/2. 351 | b1_mins = b1_xy - b1_wh_half 352 | b1_maxes = b1_xy + b1_wh_half 353 | 354 | # Expand dim to apply broadcasting. 355 | b2 = K.expand_dims(b2, 0) 356 | b2_xy = b2[..., :2] 357 | b2_wh = b2[..., 2:4] 358 | b2_wh_half = b2_wh/2. 359 | b2_mins = b2_xy - b2_wh_half 360 | b2_maxes = b2_xy + b2_wh_half 361 | 362 | intersect_mins = K.maximum(b1_mins, b2_mins) 363 | intersect_maxes = K.minimum(b1_maxes, b2_maxes) 364 | intersect_wh = K.maximum(intersect_maxes - intersect_mins, 0.) 365 | intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1] 366 | b1_area = b1_wh[..., 0] * b1_wh[..., 1] 367 | b2_area = b2_wh[..., 0] * b2_wh[..., 1] 368 | iou = intersect_area / (b1_area + b2_area - intersect_area) 369 | 370 | return iou 371 | 372 | """ 373 | 损失函数 374 | """ 375 | def yolo_loss(args, anchors, num_classes, ignore_thresh=.5, print_loss=False): 376 | '''Return yolo_loss tensor 377 | 378 | Parameters 379 | ---------- 380 | yolo_outputs: list of tensor, the output of yolo_body or tiny_yolo_body 381 | y_true: list of array, the output of preprocess_true_boxes 382 | anchors: array, shape=(N, 2), wh 383 | num_classes: integer 384 | ignore_thresh: float, the iou threshold whether to ignore object confidence loss 385 | 386 | Returns 387 | ------- 388 | loss: tensor, shape=(1,) 389 | 390 | ''' 391 | num_layers = len(anchors)//3 # default setting 392 | yolo_outputs = args[:num_layers] 393 | y_true = args[num_layers:] 394 | anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]] 395 | input_shape = K.cast(K.shape(yolo_outputs[0])[1:3] * 32, K.dtype(y_true[0])) 396 | grid_shapes = [K.cast(K.shape(yolo_outputs[l])[1:3], K.dtype(y_true[0])) for l in range(num_layers)] 397 | loss = 0 398 | m = K.shape(yolo_outputs[0])[0] # batch size, tensor 399 | mf = K.cast(m, K.dtype(yolo_outputs[0])) 400 | 401 | for l in range(num_layers): 402 | object_mask = y_true[l][..., 4:5] 403 | true_class_probs = y_true[l][..., 5:] 404 | 405 | grid, raw_pred, pred_xy, pred_wh = yolo_head(yolo_outputs[l], 406 | anchors[anchor_mask[l]], num_classes, input_shape, calc_loss=True) 407 | pred_box = K.concatenate([pred_xy, pred_wh]) 408 | 409 | # Darknet raw box to calculate loss. 410 | raw_true_xy = y_true[l][..., :2]*grid_shapes[l][::-1] - grid 411 | raw_true_wh = K.log(y_true[l][..., 2:4] / anchors[anchor_mask[l]] * input_shape[::-1]) 412 | raw_true_wh = K.switch(object_mask, raw_true_wh, K.zeros_like(raw_true_wh)) # avoid log(0)=-inf 413 | box_loss_scale = 2 - y_true[l][...,2:3]*y_true[l][...,3:4] 414 | 415 | # Find ignore mask, iterate over each of batch. 416 | ignore_mask = tf.TensorArray(K.dtype(y_true[0]), size=1, dynamic_size=True) 417 | object_mask_bool = K.cast(object_mask, 'bool') 418 | def loop_body(b, ignore_mask): 419 | true_box = tf.boolean_mask(y_true[l][b,...,0:4], object_mask_bool[b,...,0]) 420 | iou = box_iou(pred_box[b], true_box) 421 | best_iou = K.max(iou, axis=-1) 422 | ignore_mask = ignore_mask.write(b, K.cast(best_iou0: 67 | np.random.shuffle(box) 68 | if len(box)>max_boxes: box = box[:max_boxes] 69 | box[:, [0,2]] = box[:, [0,2]]*scale + dx 70 | box[:, [1,3]] = box[:, [1,3]]*scale + dy 71 | box_data[:len(box)] = box 72 | 73 | return image_data, box_data 74 | 75 | # resize image 76 | new_ar = w/h * rand(1-jitter,1+jitter)/rand(1-jitter,1+jitter) 77 | scale = rand(.25, 2) 78 | if new_ar < 1: 79 | nh = int(scale*h) 80 | nw = int(nh*new_ar) 81 | else: 82 | nw = int(scale*w) 83 | nh = int(nw/new_ar) 84 | image = image.resize((nw,nh), Image.BICUBIC) 85 | 86 | # place image 87 | dx = int(rand(0, w-nw)) 88 | dy = int(rand(0, h-nh)) 89 | new_image = Image.new('RGB', (w,h), (128,128,128)) 90 | new_image.paste(image, (dx, dy)) 91 | image = new_image 92 | 93 | # flip image or not 94 | flip = rand()<.5 95 | if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT) 96 | 97 | # distort image 98 | hue = rand(-hue, hue) 99 | sat = rand(1, sat) if rand()<.5 else 1/rand(1, sat) 100 | val = rand(1, val) if rand()<.5 else 1/rand(1, val) 101 | x = rgb_to_hsv(np.array(image)/255.) 102 | x[..., 0] += hue 103 | x[..., 0][x[..., 0]>1] -= 1 104 | x[..., 0][x[..., 0]<0] += 1 105 | x[..., 1] *= sat 106 | x[..., 2] *= val 107 | x[x>1] = 1 108 | x[x<0] = 0 109 | image_data = hsv_to_rgb(x) # numpy array, 0 to 1 110 | 111 | # correct boxes 112 | box_data = np.zeros((max_boxes,5)) 113 | if len(box)>0: 114 | np.random.shuffle(box) 115 | box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx 116 | box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy 117 | if flip: box[:, [0,2]] = w - box[:, [2,0]] 118 | box[:, 0:2][box[:, 0:2]<0] = 0 119 | box[:, 2][box[:, 2]>w] = w 120 | box[:, 3][box[:, 3]>h] = h 121 | box_w = box[:, 2] - box[:, 0] 122 | box_h = box[:, 3] - box[:, 1] 123 | box = box[np.logical_and(box_w>1, box_h>1)] # discard invalid box 124 | if len(box)>max_boxes: box = box[:max_boxes] 125 | box_data[:len(box)] = box 126 | 127 | return image_data, box_data 128 | -------------------------------------------------------------------------------- /yolo_valid.py: -------------------------------------------------------------------------------- 1 | """ 2 | Created on April, 2019 3 | @authors: Hulking 4 | """ 5 | 6 | import os 7 | import cv2 8 | import numpy as np 9 | import tensorflow as tf 10 | import sys 11 | import argparse 12 | from yolo import YOLO, detect_video 13 | from PIL import Image 14 | from pycocoEval import cal_coco_map 15 | 16 | import glob 17 | from keras.preprocessing.image import load_img, img_to_array 18 | from keras.applications.imagenet_utils import preprocess_input as preprocess 19 | from keras import backend as K 20 | 21 | np.set_printoptions(suppress=True) # to supress scientific notation in print 22 | 23 | from functools import wraps 24 | 25 | ''' 26 | 计算全部图片检测结果 27 | ''' 28 | def valid_detector(yolo): 29 | classes_path="model_data/coco_classes.txt" 30 | with open(classes_path) as f: 31 | obj_list = f.readlines() 32 | ## remove whitespace characters like `\n` at the end of each line 33 | obj_list = [x.strip() for x in obj_list] 34 | 35 | coco_ids= [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 31, 32, 36 | 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 37 | 59, 60, 61, 62, 63, 64, 65, 67, 70, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 84, 85, 86, 87, 88, 38 | 89, 90] 39 | 40 | imglist_path="model_data/5k.txt" 41 | dt_result_path = "results/cocoapi_results.json" 42 | 43 | if os.path.exists(dt_result_path): 44 | os.remove(dt_result_path) 45 | with open(dt_result_path, "a") as new_p: 46 | new_p.write("[") 47 | with open(imglist_path) as f: 48 | total_img_list = f.readlines() 49 | total_img_list = [x.strip() for x in total_img_list] 50 | total_img_num = len(total_img_list) 51 | i=0 52 | for image_path in total_img_list: 53 | 54 | if (os.path.exists(image_path)): 55 | print(i,image_path) 56 | 57 | orig_index=int(image_path[50:56]) 58 | img = Image.open(image_path) 59 | 60 | boxes, scores, classes =yolo.valid_image(img) 61 | 62 | for j in range(len(classes)): 63 | coco_id=coco_ids[int(classes[j])] 64 | top, left, bottom, right=boxes[j] 65 | 66 | width=round(right-left,4) 67 | height=round(bottom-top,4) 68 | 69 | # print("\ni, j ",i,j) 70 | # print("\nleft, top, width, height ",left, top, width, height) 71 | 72 | if i==(total_img_num-1) and j== (len(classes)-1): 73 | new_p.write( 74 | "{\"image_id\":" + str(orig_index) + ", \"category_id\":" + str(coco_id) + ", \"bbox\":[" + \ 75 | str(left) + ", " + str(top) + ", " + str(width) + ", " + str(height) + "], \"score\":" + str(scores[j]) + "}") 76 | else: 77 | #print("corrected left, top, width, height", left, top, width, height) 78 | new_p.write( 79 | "{\"image_id\":"+str(orig_index)+", \"category_id\":" +str(coco_id)+ ", \"bbox\":[" + \ 80 | str(left)+ ", " + str(top) + ", " + str(width) + ", " + str(height) + "], \"score\":"+str(scores[j]) +"},\n") 81 | i += 1 82 | new_p.write("]") 83 | print("\n\n\n") 84 | 85 | if __name__=='__main__': 86 | valid_detector(YOLO()) 87 | cal_coco_map() 88 | 89 | 90 | 91 | -------------------------------------------------------------------------------- /yolo_video.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import argparse 3 | from yolo import YOLO, detect_video 4 | from PIL import Image 5 | """ 6 | 单张图片计算 7 | """ 8 | def detect_img(yolo): 9 | while True: 10 | img = input('Input image filename:') 11 | print(img) 12 | try: 13 | image = Image.open(img) 14 | except: 15 | print('Open Error! Try again!') 16 | continue 17 | else: 18 | r_image = yolo.valid_image(image) 19 | r_image.show() 20 | yolo.close_session() 21 | 22 | FLAGS = None 23 | 24 | if __name__ == '__main__': 25 | # class YOLO defines the default value, so suppress any default here 26 | parser = argparse.ArgumentParser(argument_default=argparse.SUPPRESS) 27 | ''' 28 | Command line options 29 | ''' 30 | parser.add_argument( 31 | '--model', type=str, 32 | help='path to model weight file, default ' + YOLO.get_defaults("model_path") 33 | ) 34 | 35 | parser.add_argument( 36 | '--anchors', type=str, 37 | help='path to anchor definitions, default ' + YOLO.get_defaults("anchors_path") 38 | ) 39 | 40 | parser.add_argument( 41 | '--classes', type=str, 42 | help='path to class definitions, default ' + YOLO.get_defaults("classes_path") 43 | ) 44 | 45 | parser.add_argument( 46 | '--gpu_num', type=int, 47 | help='Number of GPU to use, default ' + str(YOLO.get_defaults("gpu_num")) 48 | ) 49 | 50 | parser.add_argument( 51 | '--image', default=False, action="store_true", 52 | help='Image detection mode, will ignore all positional arguments' 53 | ) 54 | ''' 55 | Command line positional arguments -- for video detection mode 56 | ''' 57 | parser.add_argument( 58 | "--input", nargs='?', type=str,required=False,default='./path2your_video', 59 | help = "Video input path" 60 | ) 61 | 62 | parser.add_argument( 63 | "--output", nargs='?', type=str, default="", 64 | help = "[Optional] Video output path" 65 | ) 66 | 67 | FLAGS = parser.parse_args() 68 | 69 | if FLAGS.image: 70 | """ 71 | Image detection mode, disregard any remaining command line arguments 72 | """ 73 | print("Image detection mode") 74 | if "input" in FLAGS: 75 | print(" Ignoring remaining command line arguments: " + FLAGS.input + "," + FLAGS.output) 76 | detect_img(YOLO(**vars(FLAGS))) 77 | elif "input" in FLAGS: 78 | detect_video(YOLO(**vars(FLAGS)), FLAGS.input, FLAGS.output) 79 | else: 80 | print("Must specify at least video_input_path. See usage with --help.") 81 | -------------------------------------------------------------------------------- /yolov3-tiny.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | batch=1 4 | subdivisions=1 5 | # Training 6 | # batch=64 7 | # subdivisions=2 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=16 28 | size=3 29 | stride=1 30 | pad=1 31 | activation=leaky 32 | 33 | [maxpool] 34 | size=2 35 | stride=2 36 | 37 | [convolutional] 38 | batch_normalize=1 39 | filters=32 40 | size=3 41 | stride=1 42 | pad=1 43 | activation=leaky 44 | 45 | [maxpool] 46 | size=2 47 | stride=2 48 | 49 | [convolutional] 50 | batch_normalize=1 51 | filters=64 52 | size=3 53 | stride=1 54 | pad=1 55 | activation=leaky 56 | 57 | [maxpool] 58 | size=2 59 | stride=2 60 | 61 | [convolutional] 62 | batch_normalize=1 63 | filters=128 64 | size=3 65 | stride=1 66 | pad=1 67 | activation=leaky 68 | 69 | [maxpool] 70 | size=2 71 | stride=2 72 | 73 | [convolutional] 74 | batch_normalize=1 75 | filters=256 76 | size=3 77 | stride=1 78 | pad=1 79 | activation=leaky 80 | 81 | [maxpool] 82 | size=2 83 | stride=2 84 | 85 | [convolutional] 86 | batch_normalize=1 87 | filters=512 88 | size=3 89 | stride=1 90 | pad=1 91 | activation=leaky 92 | 93 | [maxpool] 94 | size=2 95 | stride=1 96 | 97 | [convolutional] 98 | batch_normalize=1 99 | filters=1024 100 | size=3 101 | stride=1 102 | pad=1 103 | activation=leaky 104 | 105 | ########### 106 | 107 | [convolutional] 108 | batch_normalize=1 109 | filters=256 110 | size=1 111 | stride=1 112 | pad=1 113 | activation=leaky 114 | 115 | [convolutional] 116 | batch_normalize=1 117 | filters=512 118 | size=3 119 | stride=1 120 | pad=1 121 | activation=leaky 122 | 123 | [convolutional] 124 | size=1 125 | stride=1 126 | pad=1 127 | filters=255 128 | activation=linear 129 | 130 | 131 | 132 | [yolo] 133 | mask = 3,4,5 134 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 135 | classes=80 136 | num=6 137 | jitter=.3 138 | ignore_thresh = .7 139 | truth_thresh = 1 140 | random=1 141 | 142 | [route] 143 | layers = -4 144 | 145 | [convolutional] 146 | batch_normalize=1 147 | filters=128 148 | size=1 149 | stride=1 150 | pad=1 151 | activation=leaky 152 | 153 | [upsample] 154 | stride=2 155 | 156 | [route] 157 | layers = -1, 8 158 | 159 | [convolutional] 160 | batch_normalize=1 161 | filters=256 162 | size=3 163 | stride=1 164 | pad=1 165 | activation=leaky 166 | 167 | [convolutional] 168 | size=1 169 | stride=1 170 | pad=1 171 | filters=255 172 | activation=linear 173 | 174 | [yolo] 175 | mask = 1,2,3 176 | anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 177 | classes=80 178 | num=6 179 | jitter=.3 180 | ignore_thresh = .7 181 | truth_thresh = 1 182 | random=1 183 | -------------------------------------------------------------------------------- /yolov3.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | batch=1 4 | subdivisions=32 5 | # Training 6 | # batch=64 7 | # subdivisions=16 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=32 28 | size=3 29 | stride=1 30 | pad=1 31 | activation=leaky 32 | 33 | # Downsample 34 | 35 | [convolutional] 36 | batch_normalize=1 37 | filters=64 38 | size=3 39 | stride=2 40 | pad=1 41 | activation=leaky 42 | 43 | [convolutional] 44 | batch_normalize=1 45 | filters=32 46 | size=1 47 | stride=1 48 | pad=1 49 | activation=leaky 50 | 51 | [convolutional] 52 | batch_normalize=1 53 | filters=64 54 | size=3 55 | stride=1 56 | pad=1 57 | activation=leaky 58 | 59 | [shortcut] 60 | from=-3 61 | activation=linear 62 | 63 | # Downsample 64 | 65 | [convolutional] 66 | batch_normalize=1 67 | filters=128 68 | size=3 69 | stride=2 70 | pad=1 71 | activation=leaky 72 | 73 | [convolutional] 74 | batch_normalize=1 75 | filters=64 76 | size=1 77 | stride=1 78 | pad=1 79 | activation=leaky 80 | 81 | [convolutional] 82 | batch_normalize=1 83 | filters=128 84 | size=3 85 | stride=1 86 | pad=1 87 | activation=leaky 88 | 89 | [shortcut] 90 | from=-3 91 | activation=linear 92 | 93 | [convolutional] 94 | batch_normalize=1 95 | filters=64 96 | size=1 97 | stride=1 98 | pad=1 99 | activation=leaky 100 | 101 | [convolutional] 102 | batch_normalize=1 103 | filters=128 104 | size=3 105 | stride=1 106 | pad=1 107 | activation=leaky 108 | 109 | [shortcut] 110 | from=-3 111 | activation=linear 112 | 113 | # Downsample 114 | 115 | [convolutional] 116 | batch_normalize=1 117 | filters=256 118 | size=3 119 | stride=2 120 | pad=1 121 | activation=leaky 122 | 123 | [convolutional] 124 | batch_normalize=1 125 | filters=128 126 | size=1 127 | stride=1 128 | pad=1 129 | activation=leaky 130 | 131 | [convolutional] 132 | batch_normalize=1 133 | filters=256 134 | size=3 135 | stride=1 136 | pad=1 137 | activation=leaky 138 | 139 | [shortcut] 140 | from=-3 141 | activation=linear 142 | 143 | [convolutional] 144 | batch_normalize=1 145 | filters=128 146 | size=1 147 | stride=1 148 | pad=1 149 | activation=leaky 150 | 151 | [convolutional] 152 | batch_normalize=1 153 | filters=256 154 | size=3 155 | stride=1 156 | pad=1 157 | activation=leaky 158 | 159 | [shortcut] 160 | from=-3 161 | activation=linear 162 | 163 | [convolutional] 164 | batch_normalize=1 165 | filters=128 166 | size=1 167 | stride=1 168 | pad=1 169 | activation=leaky 170 | 171 | [convolutional] 172 | batch_normalize=1 173 | filters=256 174 | size=3 175 | stride=1 176 | pad=1 177 | activation=leaky 178 | 179 | [shortcut] 180 | from=-3 181 | activation=linear 182 | 183 | [convolutional] 184 | batch_normalize=1 185 | filters=128 186 | size=1 187 | stride=1 188 | pad=1 189 | activation=leaky 190 | 191 | [convolutional] 192 | batch_normalize=1 193 | filters=256 194 | size=3 195 | stride=1 196 | pad=1 197 | activation=leaky 198 | 199 | [shortcut] 200 | from=-3 201 | activation=linear 202 | 203 | 204 | [convolutional] 205 | batch_normalize=1 206 | filters=128 207 | size=1 208 | stride=1 209 | pad=1 210 | activation=leaky 211 | 212 | [convolutional] 213 | batch_normalize=1 214 | filters=256 215 | size=3 216 | stride=1 217 | pad=1 218 | activation=leaky 219 | 220 | [shortcut] 221 | from=-3 222 | activation=linear 223 | 224 | [convolutional] 225 | batch_normalize=1 226 | filters=128 227 | size=1 228 | stride=1 229 | pad=1 230 | activation=leaky 231 | 232 | [convolutional] 233 | batch_normalize=1 234 | filters=256 235 | size=3 236 | stride=1 237 | pad=1 238 | activation=leaky 239 | 240 | [shortcut] 241 | from=-3 242 | activation=linear 243 | 244 | [convolutional] 245 | batch_normalize=1 246 | filters=128 247 | size=1 248 | stride=1 249 | pad=1 250 | activation=leaky 251 | 252 | [convolutional] 253 | batch_normalize=1 254 | filters=256 255 | size=3 256 | stride=1 257 | pad=1 258 | activation=leaky 259 | 260 | [shortcut] 261 | from=-3 262 | activation=linear 263 | 264 | [convolutional] 265 | batch_normalize=1 266 | filters=128 267 | size=1 268 | stride=1 269 | pad=1 270 | activation=leaky 271 | 272 | [convolutional] 273 | batch_normalize=1 274 | filters=256 275 | size=3 276 | stride=1 277 | pad=1 278 | activation=leaky 279 | 280 | [shortcut] 281 | from=-3 282 | activation=linear 283 | 284 | # Downsample 285 | 286 | [convolutional] 287 | batch_normalize=1 288 | filters=512 289 | size=3 290 | stride=2 291 | pad=1 292 | activation=leaky 293 | 294 | [convolutional] 295 | batch_normalize=1 296 | filters=256 297 | size=1 298 | stride=1 299 | pad=1 300 | activation=leaky 301 | 302 | [convolutional] 303 | batch_normalize=1 304 | filters=512 305 | size=3 306 | stride=1 307 | pad=1 308 | activation=leaky 309 | 310 | [shortcut] 311 | from=-3 312 | activation=linear 313 | 314 | 315 | [convolutional] 316 | batch_normalize=1 317 | filters=256 318 | size=1 319 | stride=1 320 | pad=1 321 | activation=leaky 322 | 323 | [convolutional] 324 | batch_normalize=1 325 | filters=512 326 | size=3 327 | stride=1 328 | pad=1 329 | activation=leaky 330 | 331 | [shortcut] 332 | from=-3 333 | activation=linear 334 | 335 | 336 | [convolutional] 337 | batch_normalize=1 338 | filters=256 339 | size=1 340 | stride=1 341 | pad=1 342 | activation=leaky 343 | 344 | [convolutional] 345 | batch_normalize=1 346 | filters=512 347 | size=3 348 | stride=1 349 | pad=1 350 | activation=leaky 351 | 352 | [shortcut] 353 | from=-3 354 | activation=linear 355 | 356 | 357 | [convolutional] 358 | batch_normalize=1 359 | filters=256 360 | size=1 361 | stride=1 362 | pad=1 363 | activation=leaky 364 | 365 | [convolutional] 366 | batch_normalize=1 367 | filters=512 368 | size=3 369 | stride=1 370 | pad=1 371 | activation=leaky 372 | 373 | [shortcut] 374 | from=-3 375 | activation=linear 376 | 377 | [convolutional] 378 | batch_normalize=1 379 | filters=256 380 | size=1 381 | stride=1 382 | pad=1 383 | activation=leaky 384 | 385 | [convolutional] 386 | batch_normalize=1 387 | filters=512 388 | size=3 389 | stride=1 390 | pad=1 391 | activation=leaky 392 | 393 | [shortcut] 394 | from=-3 395 | activation=linear 396 | 397 | 398 | [convolutional] 399 | batch_normalize=1 400 | filters=256 401 | size=1 402 | stride=1 403 | pad=1 404 | activation=leaky 405 | 406 | [convolutional] 407 | batch_normalize=1 408 | filters=512 409 | size=3 410 | stride=1 411 | pad=1 412 | activation=leaky 413 | 414 | [shortcut] 415 | from=-3 416 | activation=linear 417 | 418 | 419 | [convolutional] 420 | batch_normalize=1 421 | filters=256 422 | size=1 423 | stride=1 424 | pad=1 425 | activation=leaky 426 | 427 | [convolutional] 428 | batch_normalize=1 429 | filters=512 430 | size=3 431 | stride=1 432 | pad=1 433 | activation=leaky 434 | 435 | [shortcut] 436 | from=-3 437 | activation=linear 438 | 439 | [convolutional] 440 | batch_normalize=1 441 | filters=256 442 | size=1 443 | stride=1 444 | pad=1 445 | activation=leaky 446 | 447 | [convolutional] 448 | batch_normalize=1 449 | filters=512 450 | size=3 451 | stride=1 452 | pad=1 453 | activation=leaky 454 | 455 | [shortcut] 456 | from=-3 457 | activation=linear 458 | 459 | # Downsample 460 | 461 | [convolutional] 462 | batch_normalize=1 463 | filters=1024 464 | size=3 465 | stride=2 466 | pad=1 467 | activation=leaky 468 | 469 | [convolutional] 470 | batch_normalize=1 471 | filters=512 472 | size=1 473 | stride=1 474 | pad=1 475 | activation=leaky 476 | 477 | [convolutional] 478 | batch_normalize=1 479 | filters=1024 480 | size=3 481 | stride=1 482 | pad=1 483 | activation=leaky 484 | 485 | [shortcut] 486 | from=-3 487 | activation=linear 488 | 489 | [convolutional] 490 | batch_normalize=1 491 | filters=512 492 | size=1 493 | stride=1 494 | pad=1 495 | activation=leaky 496 | 497 | [convolutional] 498 | batch_normalize=1 499 | filters=1024 500 | size=3 501 | stride=1 502 | pad=1 503 | activation=leaky 504 | 505 | [shortcut] 506 | from=-3 507 | activation=linear 508 | 509 | [convolutional] 510 | batch_normalize=1 511 | filters=512 512 | size=1 513 | stride=1 514 | pad=1 515 | activation=leaky 516 | 517 | [convolutional] 518 | batch_normalize=1 519 | filters=1024 520 | size=3 521 | stride=1 522 | pad=1 523 | activation=leaky 524 | 525 | [shortcut] 526 | from=-3 527 | activation=linear 528 | 529 | [convolutional] 530 | batch_normalize=1 531 | filters=512 532 | size=1 533 | stride=1 534 | pad=1 535 | activation=leaky 536 | 537 | [convolutional] 538 | batch_normalize=1 539 | filters=1024 540 | size=3 541 | stride=1 542 | pad=1 543 | activation=leaky 544 | 545 | [shortcut] 546 | from=-3 547 | activation=linear 548 | 549 | ###################### 550 | 551 | [convolutional] 552 | batch_normalize=1 553 | filters=512 554 | size=1 555 | stride=1 556 | pad=1 557 | activation=leaky 558 | 559 | [convolutional] 560 | batch_normalize=1 561 | size=3 562 | stride=1 563 | pad=1 564 | filters=1024 565 | activation=leaky 566 | 567 | [convolutional] 568 | batch_normalize=1 569 | filters=512 570 | size=1 571 | stride=1 572 | pad=1 573 | activation=leaky 574 | 575 | [convolutional] 576 | batch_normalize=1 577 | size=3 578 | stride=1 579 | pad=1 580 | filters=1024 581 | activation=leaky 582 | 583 | [convolutional] 584 | batch_normalize=1 585 | filters=512 586 | size=1 587 | stride=1 588 | pad=1 589 | activation=leaky 590 | 591 | [convolutional] 592 | batch_normalize=1 593 | size=3 594 | stride=1 595 | pad=1 596 | filters=1024 597 | activation=leaky 598 | 599 | [convolutional] 600 | size=1 601 | stride=1 602 | pad=1 603 | filters=255 604 | activation=linear 605 | 606 | 607 | [yolo] 608 | mask = 6,7,8 609 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 610 | classes=80 611 | num=9 612 | jitter=.3 613 | ignore_thresh = .5 614 | truth_thresh = 1 615 | random=1 616 | 617 | 618 | [route] 619 | layers = -4 620 | 621 | [convolutional] 622 | batch_normalize=1 623 | filters=256 624 | size=1 625 | stride=1 626 | pad=1 627 | activation=leaky 628 | 629 | [upsample] 630 | stride=2 631 | 632 | [route] 633 | layers = -1, 61 634 | 635 | 636 | 637 | [convolutional] 638 | batch_normalize=1 639 | filters=256 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=leaky 644 | 645 | [convolutional] 646 | batch_normalize=1 647 | size=3 648 | stride=1 649 | pad=1 650 | filters=512 651 | activation=leaky 652 | 653 | [convolutional] 654 | batch_normalize=1 655 | filters=256 656 | size=1 657 | stride=1 658 | pad=1 659 | activation=leaky 660 | 661 | [convolutional] 662 | batch_normalize=1 663 | size=3 664 | stride=1 665 | pad=1 666 | filters=512 667 | activation=leaky 668 | 669 | [convolutional] 670 | batch_normalize=1 671 | filters=256 672 | size=1 673 | stride=1 674 | pad=1 675 | activation=leaky 676 | 677 | [convolutional] 678 | batch_normalize=1 679 | size=3 680 | stride=1 681 | pad=1 682 | filters=512 683 | activation=leaky 684 | 685 | [convolutional] 686 | size=1 687 | stride=1 688 | pad=1 689 | filters=255 690 | activation=linear 691 | 692 | 693 | [yolo] 694 | mask = 3,4,5 695 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 696 | classes=80 697 | num=9 698 | jitter=.3 699 | ignore_thresh = .5 700 | truth_thresh = 1 701 | random=1 702 | 703 | 704 | 705 | [route] 706 | layers = -4 707 | 708 | [convolutional] 709 | batch_normalize=1 710 | filters=128 711 | size=1 712 | stride=1 713 | pad=1 714 | activation=leaky 715 | 716 | [upsample] 717 | stride=2 718 | 719 | [route] 720 | layers = -1, 36 721 | 722 | 723 | 724 | [convolutional] 725 | batch_normalize=1 726 | filters=128 727 | size=1 728 | stride=1 729 | pad=1 730 | activation=leaky 731 | 732 | [convolutional] 733 | batch_normalize=1 734 | size=3 735 | stride=1 736 | pad=1 737 | filters=256 738 | activation=leaky 739 | 740 | [convolutional] 741 | batch_normalize=1 742 | filters=128 743 | size=1 744 | stride=1 745 | pad=1 746 | activation=leaky 747 | 748 | [convolutional] 749 | batch_normalize=1 750 | size=3 751 | stride=1 752 | pad=1 753 | filters=256 754 | activation=leaky 755 | 756 | [convolutional] 757 | batch_normalize=1 758 | filters=128 759 | size=1 760 | stride=1 761 | pad=1 762 | activation=leaky 763 | 764 | [convolutional] 765 | batch_normalize=1 766 | size=3 767 | stride=1 768 | pad=1 769 | filters=256 770 | activation=leaky 771 | 772 | [convolutional] 773 | size=1 774 | stride=1 775 | pad=1 776 | filters=255 777 | activation=linear 778 | 779 | 780 | [yolo] 781 | mask = 0,1,2 782 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 783 | classes=80 784 | num=9 785 | jitter=.3 786 | ignore_thresh = .5 787 | truth_thresh = 1 788 | random=1 789 | 790 | --------------------------------------------------------------------------------