├── README.md
├── etc
    ├── cover01.jpg
    └── cover02.jpg
├── font
    ├── FiraMono-Medium.otf
    └── SIL+Open+Font+License.txt
├── images
    └── test4.jpg
├── model_data
    ├── __pycache__
    │   └── yad2k.cpython-35.pyc
    ├── coco_classes.txt
    └── yolo_anchors.txt
├── out
    └── test4.jpg
├── yad2k
    ├── models
    │   ├── __pycache__
    │   │   ├── keras_darknet19.cpython-35.pyc
    │   │   └── keras_yolo.cpython-35.pyc
    │   ├── keras_darknet19.py
    │   └── keras_yolo.py
    └── utils
    │   ├── __init__.py
    │   ├── __pycache__
    │       ├── __init__.cpython-35.pyc
    │       └── utils.cpython-35.pyc
    │   └── utils.py
├── yolo.py
└── yolo_utils.py


/README.md:
--------------------------------------------------------------------------------
 1 | # YOLOv2 Object Detection w/ Keras (in just 20 lines of code)
 2 | 
 3 | This repository presents a quick and simple implementation of YOLOv2 object detection using Keras library with Tensorflow backend.
 4 | Credits goes to [YAD2K Library](https://github.com/allanzelener/YAD2K) on top of which this implementation was built. 
 5 | 
 6 | ![cover01](etc/cover01.jpg)
 7 | ![cover02](etc/cover02.jpg)
 8 | ###### Note that I do not hold ownership to any of the above pictures. These are merely used for educational purposes to describe the concepts. 
 9 | --------------------------------------------------------------------------------
10 | ## Thoughts on the implementation
11 | 
12 | YOLO is well known technique used to perform fast multiple localizations on a single image.  
13 | 
14 | A brief algorithm breakdown;
15 | - Divide the image using a grid (eg: 19x19)
16 | - Perform image classification and Localization on each grid cell -> Result, a vector for each cell representing the probability of an object detected, the dimensions of the bounding box and class of the detected image.
17 | - Perform thresholding to remove multiple detected instances 
18 | - Perform Non-max suppression to refine the boxes more
19 | - Additionally anchor boxes are used to detect several objects in one grid cell
20 | 
21 | ###### If you want to dive down into how these above points are implemented in the code refer yolo_eval function in the keras_yolo.py file from the yad2k/models directory.
22 | 
23 | Paper reference: [YOLO9000: Better, Faster, Stronger](https://arxiv.org/abs/1612.08242) by Joseph Redmond and Ali Farhadi.
24 | 
25 | [Keras](https://keras.io/) is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
26 | 
27 | The use of keras helps to understand the concepts underling a ML technique by reducing the workload on coding. Thus, this implementation becomes a good platform for beginners to core concepts and have a quick implementation giving results. 
28 | 
29 | Also for anyone who is looking to integrate object detection capabilities in applications, this code can be incorperated with a few lines of code. 
30 | 
31 | Blog post on this: https://medium.com/@miranthaj/quick-implementation-of-yolo-v2-with-keras-ebf6eb40c684
32 | 
33 | --------------------------------------------------------------------------------
34 | 
35 | ## Quick Start
36 | 
37 | - Clone this repository to your PC
38 | - Download any Darknet model cfg and weights from the [official YOLO website](http://pjreddie.com/darknet/yolo/). 
39 | - Convert the dowloaded cfg and weights files into a h5 file using YAD2K library. (This is explained step by step below in the more details section)
40 | - Copy the generated h5 file to the model_data folder and edit the name of the pretrained model in yolo.py code to the name of your h5 file.
41 | - Place the input image you want to try object detection in the images folder and copy its file name.
42 | - Assign your input image file name to input_image_name variable in yolo.py.
43 | - Open terminal from the repository directory directly and run the yolo.py file
44 | 	
45 | 	`python yolo.py`
46 | 
47 | --------------------------------------------------------------------------------
48 | 
49 | ## More Details
50 | 
51 | How to convert cfg and weights files to h5 using YAD2k library (Windows)
52 | 
53 | - Clone the [YAD2K Library](https://github.com/allanzelener/YAD2K) to your PC
54 | - Open terminal from the cloned directory
55 | - Copy and paste the downloaded weights and cfg files to the YAD2K master directory
56 | - Run `python yad2k.py yolo.cfg yolo.weights model_data/yolo.h5` on the terminal and the h5 file will be generated.
57 | - Move the generated h5 file to model_data folder of the simpleYOLOwKeras directory
58 | 
59 | 
60 | 
61 | -------------------------------------------------------------------------------
62 | 


--------------------------------------------------------------------------------
/etc/cover01.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/miranthajayatilake/YOLOw-Keras/d1526fc6a64ccb163e26f1e2504389e565249dc8/etc/cover01.jpg


--------------------------------------------------------------------------------
/etc/cover02.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/miranthajayatilake/YOLOw-Keras/d1526fc6a64ccb163e26f1e2504389e565249dc8/etc/cover02.jpg


--------------------------------------------------------------------------------
/font/FiraMono-Medium.otf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/miranthajayatilake/YOLOw-Keras/d1526fc6a64ccb163e26f1e2504389e565249dc8/font/FiraMono-Medium.otf


--------------------------------------------------------------------------------
/font/SIL+Open+Font+License.txt:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2014, Mozilla Foundation https://mozilla.org/ with Reserved Font Name Fira Mono.
 2 | 
 3 | Copyright (c) 2014, Telefonica S.A.
 4 | 
 5 | This Font Software is licensed under the SIL Open Font License, Version 1.1.
 6 | This license is copied below, and is also available with a FAQ at: http://scripts.sil.org/OFL
 7 | 
 8 | -----------------------------------------------------------
 9 | SIL OPEN FONT LICENSE Version 1.1 - 26 February 2007
10 | -----------------------------------------------------------
11 | 
12 | PREAMBLE
13 | The goals of the Open Font License (OFL) are to stimulate worldwide development of collaborative font projects, to support the font creation efforts of academic and linguistic communities, and to provide a free and open framework in which fonts may be shared and improved in partnership with others.
14 | 
15 | The OFL allows the licensed fonts to be used, studied, modified and redistributed freely as long as they are not sold by themselves. The fonts, including any derivative works, can be bundled, embedded, redistributed and/or sold with any software provided that any reserved names are not used by derivative works. The fonts and derivatives, however, cannot be released under any other type of license. The requirement for fonts to remain under this license does not apply to any document created using the fonts or their derivatives.
16 | 
17 | DEFINITIONS
18 | "Font Software" refers to the set of files released by the Copyright Holder(s) under this license and clearly marked as such. This may include source files, build scripts and documentation.
19 | 
20 | "Reserved Font Name" refers to any names specified as such after the copyright statement(s).
21 | 
22 | "Original Version" refers to the collection of Font Software components as distributed by the Copyright Holder(s).
23 | 
24 | "Modified Version" refers to any derivative made by adding to, deleting, or substituting -- in part or in whole -- any of the components of the Original Version, by changing formats or by porting the Font Software to a new environment.
25 | 
26 | "Author" refers to any designer, engineer, programmer, technical writer or other person who contributed to the Font Software.
27 | 
28 | PERMISSION & CONDITIONS
29 | Permission is hereby granted, free of charge, to any person obtaining a copy of the Font Software, to use, study, copy, merge, embed, modify, redistribute, and sell modified and unmodified copies of the Font Software, subject to the following conditions:
30 | 
31 | 1) Neither the Font Software nor any of its individual components, in Original or Modified Versions, may be sold by itself.
32 | 
33 | 2) Original or Modified Versions of the Font Software may be bundled, redistributed and/or sold with any software, provided that each copy contains the above copyright notice and this license. These can be included either as stand-alone text files, human-readable headers or in the appropriate machine-readable metadata fields within text or binary files as long as those fields can be easily viewed by the user.
34 | 
35 | 3) No Modified Version of the Font Software may use the Reserved Font Name(s) unless explicit written permission is granted by the corresponding Copyright Holder. This restriction only applies to the primary font name as presented to the users.
36 | 
37 | 4) The name(s) of the Copyright Holder(s) or the Author(s) of the Font Software shall not be used to promote, endorse or advertise any Modified Version, except to acknowledge the contribution(s) of the Copyright Holder(s) and the Author(s) or with their explicit written permission.
38 | 
39 | 5) The Font Software, modified or unmodified, in part or in whole, must be distributed entirely under this license, and must not be distributed under any other license. The requirement for fonts to remain under this license does not apply to any document created using the Font Software.
40 | 
41 | TERMINATION
42 | This license becomes null and void if any of the above conditions are not met.
43 | 
44 | DISCLAIMER
45 | THE FONT SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF COPYRIGHT, PATENT, TRADEMARK, OR OTHER RIGHT. IN NO EVENT SHALL THE COPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, INCLUDING ANY GENERAL, SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF THE USE OR INABILITY TO USE THE FONT SOFTWARE OR FROM OTHER DEALINGS IN THE FONT SOFTWARE.


--------------------------------------------------------------------------------
/images/test4.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/miranthajayatilake/YOLOw-Keras/d1526fc6a64ccb163e26f1e2504389e565249dc8/images/test4.jpg


--------------------------------------------------------------------------------
/model_data/__pycache__/yad2k.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/miranthajayatilake/YOLOw-Keras/d1526fc6a64ccb163e26f1e2504389e565249dc8/model_data/__pycache__/yad2k.cpython-35.pyc


--------------------------------------------------------------------------------
/model_data/coco_classes.txt:
--------------------------------------------------------------------------------
 1 | person
 2 | bicycle
 3 | car
 4 | motorbike
 5 | aeroplane
 6 | bus
 7 | train
 8 | truck
 9 | boat
10 | traffic light
11 | fire hydrant
12 | stop sign
13 | parking meter
14 | bench
15 | bird
16 | cat
17 | dog
18 | horse
19 | sheep
20 | cow
21 | elephant
22 | bear
23 | zebra
24 | giraffe
25 | backpack
26 | umbrella
27 | handbag
28 | tie
29 | suitcase
30 | frisbee
31 | skis
32 | snowboard
33 | sports ball
34 | kite
35 | baseball bat
36 | baseball glove
37 | skateboard
38 | surfboard
39 | tennis racket
40 | bottle
41 | wine glass
42 | cup
43 | fork
44 | knife
45 | spoon
46 | bowl
47 | banana
48 | apple
49 | sandwich
50 | orange
51 | broccoli
52 | carrot
53 | hot dog
54 | pizza
55 | donut
56 | cake
57 | chair
58 | sofa
59 | pottedplant
60 | bed
61 | diningtable
62 | toilet
63 | tvmonitor
64 | laptop
65 | mouse
66 | remote
67 | keyboard
68 | cell phone
69 | microwave
70 | oven
71 | toaster
72 | sink
73 | refrigerator
74 | book
75 | clock
76 | vase
77 | scissors
78 | teddy bear
79 | hair drier
80 | toothbrush
81 | 


--------------------------------------------------------------------------------
/model_data/yolo_anchors.txt:
--------------------------------------------------------------------------------
1 | 0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828
2 | 


--------------------------------------------------------------------------------
/out/test4.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/miranthajayatilake/YOLOw-Keras/d1526fc6a64ccb163e26f1e2504389e565249dc8/out/test4.jpg


--------------------------------------------------------------------------------
/yad2k/models/__pycache__/keras_darknet19.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/miranthajayatilake/YOLOw-Keras/d1526fc6a64ccb163e26f1e2504389e565249dc8/yad2k/models/__pycache__/keras_darknet19.cpython-35.pyc


--------------------------------------------------------------------------------
/yad2k/models/__pycache__/keras_yolo.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/miranthajayatilake/YOLOw-Keras/d1526fc6a64ccb163e26f1e2504389e565249dc8/yad2k/models/__pycache__/keras_yolo.cpython-35.pyc


--------------------------------------------------------------------------------
/yad2k/models/keras_darknet19.py:
--------------------------------------------------------------------------------
 1 | """Darknet19 Model Defined in Keras."""
 2 | import functools
 3 | from functools import partial
 4 | 
 5 | from keras.layers import Conv2D, MaxPooling2D
 6 | from keras.layers.advanced_activations import LeakyReLU
 7 | from keras.layers.normalization import BatchNormalization
 8 | from keras.models import Model
 9 | from keras.regularizers import l2
10 | 
11 | from ..utils import compose
12 | 
13 | # Partial wrapper for Convolution2D with static default argument.
14 | _DarknetConv2D = partial(Conv2D, padding='same')
15 | 
16 | 
17 | @functools.wraps(Conv2D)
18 | def DarknetConv2D(*args, **kwargs):
19 |     """Wrapper to set Darknet weight regularizer for Convolution2D."""
20 |     darknet_conv_kwargs = {'kernel_regularizer': l2(5e-4)}
21 |     darknet_conv_kwargs.update(kwargs)
22 |     return _DarknetConv2D(*args, **darknet_conv_kwargs)
23 | 
24 | 
25 | def DarknetConv2D_BN_Leaky(*args, **kwargs):
26 |     """Darknet Convolution2D followed by BatchNormalization and LeakyReLU."""
27 |     no_bias_kwargs = {'use_bias': False}
28 |     no_bias_kwargs.update(kwargs)
29 |     return compose(
30 |         DarknetConv2D(*args, **no_bias_kwargs),
31 |         BatchNormalization(),
32 |         LeakyReLU(alpha=0.1))
33 | 
34 | 
35 | def bottleneck_block(outer_filters, bottleneck_filters):
36 |     """Bottleneck block of 3x3, 1x1, 3x3 convolutions."""
37 |     return compose(
38 |         DarknetConv2D_BN_Leaky(outer_filters, (3, 3)),
39 |         DarknetConv2D_BN_Leaky(bottleneck_filters, (1, 1)),
40 |         DarknetConv2D_BN_Leaky(outer_filters, (3, 3)))
41 | 
42 | 
43 | def bottleneck_x2_block(outer_filters, bottleneck_filters):
44 |     """Bottleneck block of 3x3, 1x1, 3x3, 1x1, 3x3 convolutions."""
45 |     return compose(
46 |         bottleneck_block(outer_filters, bottleneck_filters),
47 |         DarknetConv2D_BN_Leaky(bottleneck_filters, (1, 1)),
48 |         DarknetConv2D_BN_Leaky(outer_filters, (3, 3)))
49 | 
50 | 
51 | def darknet_body():
52 |     """Generate first 18 conv layers of Darknet-19."""
53 |     return compose(
54 |         DarknetConv2D_BN_Leaky(32, (3, 3)),
55 |         MaxPooling2D(),
56 |         DarknetConv2D_BN_Leaky(64, (3, 3)),
57 |         MaxPooling2D(),
58 |         bottleneck_block(128, 64),
59 |         MaxPooling2D(),
60 |         bottleneck_block(256, 128),
61 |         MaxPooling2D(),
62 |         bottleneck_x2_block(512, 256),
63 |         MaxPooling2D(),
64 |         bottleneck_x2_block(1024, 512))
65 | 
66 | 
67 | def darknet19(inputs):
68 |     """Generate Darknet-19 model for Imagenet classification."""
69 |     body = darknet_body()(inputs)
70 |     logits = DarknetConv2D(1000, (1, 1), activation='softmax')(body)
71 |     return Model(inputs, logits)
72 | 


--------------------------------------------------------------------------------
/yad2k/models/keras_yolo.py:
--------------------------------------------------------------------------------
  1 | """YOLO_v2 Model Defined in Keras."""
  2 | import sys
  3 | 
  4 | import numpy as np
  5 | import tensorflow as tf
  6 | from keras import backend as K
  7 | from keras.layers import Lambda
  8 | from keras.layers.merge import concatenate
  9 | from keras.models import Model
 10 | 
 11 | from ..utils import compose
 12 | from .keras_darknet19 import (DarknetConv2D, DarknetConv2D_BN_Leaky, darknet_body)
 13 | 
 14 | sys.path.append('..')
 15 | 
 16 | voc_anchors = np.array(
 17 |     [[1.08, 1.19], [3.42, 4.41], [6.63, 11.38], [9.42, 5.11], [16.62, 10.52]])
 18 | 
 19 | voc_classes = [
 20 |     "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat",
 21 |     "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person",
 22 |     "pottedplant", "sheep", "sofa", "train", "tvmonitor"
 23 | ]
 24 | 
 25 | 
 26 | def space_to_depth_x2(x):
 27 |     """Thin wrapper for Tensorflow space_to_depth with block_size=2."""
 28 |     # Import currently required to make Lambda work.
 29 |     # See: https://github.com/fchollet/keras/issues/5088#issuecomment-273851273
 30 |     import tensorflow as tf
 31 |     return tf.space_to_depth(x, block_size=2)
 32 | 
 33 | 
 34 | def space_to_depth_x2_output_shape(input_shape):
 35 |     """Determine space_to_depth output shape for block_size=2.
 36 | 
 37 |     Note: For Lambda with TensorFlow backend, output shape may not be needed.
 38 |     """
 39 |     return (input_shape[0], input_shape[1] // 2, input_shape[2] // 2, 4 *
 40 |             input_shape[3]) if input_shape[1] else (input_shape[0], None, None,
 41 |                                                     4 * input_shape[3])
 42 | 
 43 | 
 44 | def yolo_body(inputs, num_anchors, num_classes):
 45 |     """Create YOLO_V2 model CNN body in Keras."""
 46 |     darknet = Model(inputs, darknet_body()(inputs))
 47 |     conv20 = compose(
 48 |         DarknetConv2D_BN_Leaky(1024, (3, 3)),
 49 |         DarknetConv2D_BN_Leaky(1024, (3, 3)))(darknet.output)
 50 | 
 51 |     conv13 = darknet.layers[43].output
 52 |     conv21 = DarknetConv2D_BN_Leaky(64, (1, 1))(conv13)
 53 |     # TODO: Allow Keras Lambda to use func arguments for output_shape?
 54 |     conv21_reshaped = Lambda(
 55 |         space_to_depth_x2,
 56 |         output_shape=space_to_depth_x2_output_shape,
 57 |         name='space_to_depth')(conv21)
 58 | 
 59 |     x = concatenate([conv21_reshaped, conv20])
 60 |     x = DarknetConv2D_BN_Leaky(1024, (3, 3))(x)
 61 |     x = DarknetConv2D(num_anchors * (num_classes + 5), (1, 1))(x)
 62 |     return Model(inputs, x)
 63 | 
 64 | 
 65 | def yolo_head(feats, anchors, num_classes):
 66 |     """Convert final layer features to bounding box parameters.
 67 | 
 68 |     Parameters
 69 |     ----------
 70 |     feats : tensor
 71 |         Final convolutional layer features.
 72 |     anchors : array-like
 73 |         Anchor box widths and heights.
 74 |     num_classes : int
 75 |         Number of target classes.
 76 | 
 77 |     Returns
 78 |     -------
 79 |     box_xy : tensor
 80 |         x, y box predictions adjusted by spatial location in conv layer.
 81 |     box_wh : tensor
 82 |         w, h box predictions adjusted by anchors and conv spatial resolution.
 83 |     box_conf : tensor
 84 |         Probability estimate for whether each box contains any object.
 85 |     box_class_pred : tensor
 86 |         Probability distribution estimate for each box over class labels.
 87 |     """
 88 |     num_anchors = len(anchors)
 89 |     # Reshape to batch, height, width, num_anchors, box_params.
 90 |     anchors_tensor = K.reshape(K.variable(anchors), [1, 1, 1, num_anchors, 2])
 91 |     # Static implementation for fixed models.
 92 |     # TODO: Remove or add option for static implementation.
 93 |     # _, conv_height, conv_width, _ = K.int_shape(feats)
 94 |     # conv_dims = K.variable([conv_width, conv_height])
 95 | 
 96 |     # Dynamic implementation of conv dims for fully convolutional model.
 97 |     conv_dims = K.shape(feats)[1:3]  # assuming channels last
 98 |     # In YOLO the height index is the inner most iteration.
 99 |     conv_height_index = K.arange(0, stop=conv_dims[0])
100 |     conv_width_index = K.arange(0, stop=conv_dims[1])
101 |     conv_height_index = K.tile(conv_height_index, [conv_dims[1]])
102 | 
103 |     # TODO: Repeat_elements and tf.split doesn't support dynamic splits.
104 |     # conv_width_index = K.repeat_elements(conv_width_index, conv_dims[1], axis=0)
105 |     conv_width_index = K.tile(K.expand_dims(conv_width_index, 0), [conv_dims[0], 1])
106 |     conv_width_index = K.flatten(K.transpose(conv_width_index))
107 |     conv_index = K.transpose(K.stack([conv_height_index, conv_width_index]))
108 |     conv_index = K.reshape(conv_index, [1, conv_dims[0], conv_dims[1], 1, 2])
109 |     conv_index = K.cast(conv_index, K.dtype(feats))
110 |     
111 |     feats = K.reshape(feats, [-1, conv_dims[0], conv_dims[1], num_anchors, num_classes + 5])
112 |     conv_dims = K.cast(K.reshape(conv_dims, [1, 1, 1, 1, 2]), K.dtype(feats))
113 | 
114 |     # Static generation of conv_index:
115 |     # conv_index = np.array([_ for _ in np.ndindex(conv_width, conv_height)])
116 |     # conv_index = conv_index[:, [1, 0]]  # swap columns for YOLO ordering.
117 |     # conv_index = K.variable(
118 |     #     conv_index.reshape(1, conv_height, conv_width, 1, 2))
119 |     # feats = Reshape(
120 |     #     (conv_dims[0], conv_dims[1], num_anchors, num_classes + 5))(feats)
121 | 
122 |     box_confidence = K.sigmoid(feats[..., 4:5])
123 |     box_xy = K.sigmoid(feats[..., :2])
124 |     box_wh = K.exp(feats[..., 2:4])
125 |     box_class_probs = K.softmax(feats[..., 5:])
126 | 
127 |     # Adjust preditions to each spatial grid point and anchor size.
128 |     # Note: YOLO iterates over height index before width index.
129 |     box_xy = (box_xy + conv_index) / conv_dims
130 |     box_wh = box_wh * anchors_tensor / conv_dims
131 | 
132 |     return box_confidence, box_xy, box_wh, box_class_probs
133 | 
134 | 
135 | def yolo_boxes_to_corners(box_xy, box_wh):
136 |     """Convert YOLO box predictions to bounding box corners."""
137 |     box_mins = box_xy - (box_wh / 2.)
138 |     box_maxes = box_xy + (box_wh / 2.)
139 | 
140 |     return K.concatenate([
141 |         box_mins[..., 1:2],  # y_min
142 |         box_mins[..., 0:1],  # x_min
143 |         box_maxes[..., 1:2],  # y_max
144 |         box_maxes[..., 0:1]  # x_max
145 |     ])
146 | 
147 | 
148 | def yolo_loss(args,
149 |               anchors,
150 |               num_classes,
151 |               rescore_confidence=False,
152 |               print_loss=False):
153 |     """YOLO localization loss function.
154 | 
155 |     Parameters
156 |     ----------
157 |     yolo_output : tensor
158 |         Final convolutional layer features.
159 | 
160 |     true_boxes : tensor
161 |         Ground truth boxes tensor with shape [batch, num_true_boxes, 5]
162 |         containing box x_center, y_center, width, height, and class.
163 | 
164 |     detectors_mask : array
165 |         0/1 mask for detector positions where there is a matching ground truth.
166 | 
167 |     matching_true_boxes : array
168 |         Corresponding ground truth boxes for positive detector positions.
169 |         Already adjusted for conv height and width.
170 | 
171 |     anchors : tensor
172 |         Anchor boxes for model.
173 | 
174 |     num_classes : int
175 |         Number of object classes.
176 | 
177 |     rescore_confidence : bool, default=False
178 |         If true then set confidence target to IOU of best predicted box with
179 |         the closest matching ground truth box.
180 | 
181 |     print_loss : bool, default=False
182 |         If True then use a tf.Print() to print the loss components.
183 | 
184 |     Returns
185 |     -------
186 |     mean_loss : float
187 |         mean localization loss across minibatch
188 |     """
189 |     (yolo_output, true_boxes, detectors_mask, matching_true_boxes) = args
190 |     num_anchors = len(anchors)
191 |     object_scale = 5
192 |     no_object_scale = 1
193 |     class_scale = 1
194 |     coordinates_scale = 1
195 |     pred_xy, pred_wh, pred_confidence, pred_class_prob = yolo_head(
196 |         yolo_output, anchors, num_classes)
197 | 
198 |     # Unadjusted box predictions for loss.
199 |     # TODO: Remove extra computation shared with yolo_head.
200 |     yolo_output_shape = K.shape(yolo_output)
201 |     feats = K.reshape(yolo_output, [
202 |         -1, yolo_output_shape[1], yolo_output_shape[2], num_anchors,
203 |         num_classes + 5
204 |     ])
205 |     pred_boxes = K.concatenate(
206 |         (K.sigmoid(feats[..., 0:2]), feats[..., 2:4]), axis=-1)
207 | 
208 |     # TODO: Adjust predictions by image width/height for non-square images?
209 |     # IOUs may be off due to different aspect ratio.
210 | 
211 |     # Expand pred x,y,w,h to allow comparison with ground truth.
212 |     # batch, conv_height, conv_width, num_anchors, num_true_boxes, box_params
213 |     pred_xy = K.expand_dims(pred_xy, 4)
214 |     pred_wh = K.expand_dims(pred_wh, 4)
215 | 
216 |     pred_wh_half = pred_wh / 2.
217 |     pred_mins = pred_xy - pred_wh_half
218 |     pred_maxes = pred_xy + pred_wh_half
219 | 
220 |     true_boxes_shape = K.shape(true_boxes)
221 | 
222 |     # batch, conv_height, conv_width, num_anchors, num_true_boxes, box_params
223 |     true_boxes = K.reshape(true_boxes, [
224 |         true_boxes_shape[0], 1, 1, 1, true_boxes_shape[1], true_boxes_shape[2]
225 |     ])
226 |     true_xy = true_boxes[..., 0:2]
227 |     true_wh = true_boxes[..., 2:4]
228 | 
229 |     # Find IOU of each predicted box with each ground truth box.
230 |     true_wh_half = true_wh / 2.
231 |     true_mins = true_xy - true_wh_half
232 |     true_maxes = true_xy + true_wh_half
233 | 
234 |     intersect_mins = K.maximum(pred_mins, true_mins)
235 |     intersect_maxes = K.minimum(pred_maxes, true_maxes)
236 |     intersect_wh = K.maximum(intersect_maxes - intersect_mins, 0.)
237 |     intersect_areas = intersect_wh[..., 0] * intersect_wh[..., 1]
238 | 
239 |     pred_areas = pred_wh[..., 0] * pred_wh[..., 1]
240 |     true_areas = true_wh[..., 0] * true_wh[..., 1]
241 | 
242 |     union_areas = pred_areas + true_areas - intersect_areas
243 |     iou_scores = intersect_areas / union_areas
244 | 
245 |     # Best IOUs for each location.
246 |     best_ious = K.max(iou_scores, axis=4)  # Best IOU scores.
247 |     best_ious = K.expand_dims(best_ious)
248 | 
249 |     # A detector has found an object if IOU > thresh for some true box.
250 |     object_detections = K.cast(best_ious > 0.6, K.dtype(best_ious))
251 | 
252 |     # TODO: Darknet region training includes extra coordinate loss for early
253 |     # training steps to encourage predictions to match anchor priors.
254 | 
255 |     # Determine confidence weights from object and no_object weights.
256 |     # NOTE: YOLO does not use binary cross-entropy here.
257 |     no_object_weights = (no_object_scale * (1 - object_detections) *
258 |                          (1 - detectors_mask))
259 |     no_objects_loss = no_object_weights * K.square(-pred_confidence)
260 | 
261 |     if rescore_confidence:
262 |         objects_loss = (object_scale * detectors_mask *
263 |                         K.square(best_ious - pred_confidence))
264 |     else:
265 |         objects_loss = (object_scale * detectors_mask *
266 |                         K.square(1 - pred_confidence))
267 |     confidence_loss = objects_loss + no_objects_loss
268 | 
269 |     # Classification loss for matching detections.
270 |     # NOTE: YOLO does not use categorical cross-entropy loss here.
271 |     matching_classes = K.cast(matching_true_boxes[..., 4], 'int32')
272 |     matching_classes = K.one_hot(matching_classes, num_classes)
273 |     classification_loss = (class_scale * detectors_mask *
274 |                            K.square(matching_classes - pred_class_prob))
275 | 
276 |     # Coordinate loss for matching detection boxes.
277 |     matching_boxes = matching_true_boxes[..., 0:4]
278 |     coordinates_loss = (coordinates_scale * detectors_mask *
279 |                         K.square(matching_boxes - pred_boxes))
280 | 
281 |     confidence_loss_sum = K.sum(confidence_loss)
282 |     classification_loss_sum = K.sum(classification_loss)
283 |     coordinates_loss_sum = K.sum(coordinates_loss)
284 |     total_loss = 0.5 * (
285 |         confidence_loss_sum + classification_loss_sum + coordinates_loss_sum)
286 |     if print_loss:
287 |         total_loss = tf.Print(
288 |             total_loss, [
289 |                 total_loss, confidence_loss_sum, classification_loss_sum,
290 |                 coordinates_loss_sum
291 |             ],
292 |             message='yolo_loss, conf_loss, class_loss, box_coord_loss:')
293 | 
294 |     return total_loss
295 | 
296 | 
297 | def yolo(inputs, anchors, num_classes):
298 |     """Generate a complete YOLO_v2 localization model."""
299 |     num_anchors = len(anchors)
300 |     body = yolo_body(inputs, num_anchors, num_classes)
301 |     outputs = yolo_head(body.output, anchors, num_classes)
302 |     return outputs
303 | 
304 | 
305 | def yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold=.6):
306 |     """Filter YOLO boxes based on object and class confidence."""
307 | 
308 |     box_scores = box_confidence * box_class_probs
309 |     box_classes = K.argmax(box_scores, axis=-1)
310 |     box_class_scores = K.max(box_scores, axis=-1)
311 |     prediction_mask = box_class_scores >= threshold
312 | 
313 |     # TODO: Expose tf.boolean_mask to Keras backend?
314 |     boxes = tf.boolean_mask(boxes, prediction_mask)
315 |     scores = tf.boolean_mask(box_class_scores, prediction_mask)
316 |     classes = tf.boolean_mask(box_classes, prediction_mask)
317 | 
318 |     return boxes, scores, classes
319 | 
320 | 
321 | def yolo_eval(yolo_outputs,
322 |               image_shape,
323 |               max_boxes=10,
324 |               score_threshold=.6,
325 |               iou_threshold=.5):
326 |     """Evaluate YOLO model on given input batch and return filtered boxes."""
327 |     box_confidence, box_xy, box_wh, box_class_probs = yolo_outputs
328 |     boxes = yolo_boxes_to_corners(box_xy, box_wh)
329 |     boxes, scores, classes = yolo_filter_boxes(
330 |         box_confidence, boxes, box_class_probs, threshold=score_threshold)
331 |     
332 |     # Scale boxes back to original image shape.
333 |     height = image_shape[0]
334 |     width = image_shape[1]
335 |     image_dims = K.stack([height, width, height, width])
336 |     image_dims = K.reshape(image_dims, [1, 4])
337 |     boxes = boxes * image_dims
338 | 
339 |     # TODO: Something must be done about this ugly hack!
340 |     max_boxes_tensor = K.variable(max_boxes, dtype='int32')
341 |     K.get_session().run(tf.variables_initializer([max_boxes_tensor]))
342 |     nms_index = tf.image.non_max_suppression(boxes, scores, max_boxes_tensor, iou_threshold=iou_threshold)
343 |     # nms_index = tf.image.non_max_suppression(boxes, scores, max_boxes, iou_threshold=iou_threshold)
344 |     boxes = K.gather(boxes, nms_index)
345 |     scores = K.gather(scores, nms_index)
346 |     classes = K.gather(classes, nms_index)
347 |     
348 |     return boxes, scores, classes
349 | 
350 | 
351 | def preprocess_true_boxes(true_boxes, anchors, image_size):
352 |     """Find detector in YOLO where ground truth box should appear.
353 | 
354 |     Parameters
355 |     ----------
356 |     true_boxes : array
357 |         List of ground truth boxes in form of relative x, y, w, h, class.
358 |         Relative coordinates are in the range [0, 1] indicating a percentage
359 |         of the original image dimensions.
360 |     anchors : array
361 |         List of anchors in form of w, h.
362 |         Anchors are assumed to be in the range [0, conv_size] where conv_size
363 |         is the spatial dimension of the final convolutional features.
364 |     image_size : array-like
365 |         List of image dimensions in form of h, w in pixels.
366 | 
367 |     Returns
368 |     -------
369 |     detectors_mask : array
370 |         0/1 mask for detectors in [conv_height, conv_width, num_anchors, 1]
371 |         that should be compared with a matching ground truth box.
372 |     matching_true_boxes: array
373 |         Same shape as detectors_mask with the corresponding ground truth box
374 |         adjusted for comparison with predicted parameters at training time.
375 |     """
376 |     height, width = image_size
377 |     num_anchors = len(anchors)
378 |     # Downsampling factor of 5x 2-stride max_pools == 32.
379 |     # TODO: Remove hardcoding of downscaling calculations.
380 |     assert height % 32 == 0, 'Image sizes in YOLO_v2 must be multiples of 32.'
381 |     assert width % 32 == 0, 'Image sizes in YOLO_v2 must be multiples of 32.'
382 |     conv_height = height // 32
383 |     conv_width = width // 32
384 |     num_box_params = true_boxes.shape[1]
385 |     detectors_mask = np.zeros(
386 |         (conv_height, conv_width, num_anchors, 1), dtype=np.float32)
387 |     matching_true_boxes = np.zeros(
388 |         (conv_height, conv_width, num_anchors, num_box_params),
389 |         dtype=np.float32)
390 | 
391 |     for box in true_boxes:
392 |         # scale box to convolutional feature spatial dimensions
393 |         box_class = box[4:5]
394 |         box = box[0:4] * np.array(
395 |             [conv_width, conv_height, conv_width, conv_height])
396 |         i = np.floor(box[1]).astype('int')
397 |         j = min(np.floor(box[0]).astype('int'),1)
398 |         best_iou = 0
399 |         best_anchor = 0
400 |                 
401 |         for k, anchor in enumerate(anchors):
402 |             # Find IOU between box shifted to origin and anchor box.
403 |             box_maxes = box[2:4] / 2.
404 |             box_mins = -box_maxes
405 |             anchor_maxes = (anchor / 2.)
406 |             anchor_mins = -anchor_maxes
407 | 
408 |             intersect_mins = np.maximum(box_mins, anchor_mins)
409 |             intersect_maxes = np.minimum(box_maxes, anchor_maxes)
410 |             intersect_wh = np.maximum(intersect_maxes - intersect_mins, 0.)
411 |             intersect_area = intersect_wh[0] * intersect_wh[1]
412 |             box_area = box[2] * box[3]
413 |             anchor_area = anchor[0] * anchor[1]
414 |             iou = intersect_area / (box_area + anchor_area - intersect_area)
415 |             if iou > best_iou:
416 |                 best_iou = iou
417 |                 best_anchor = k
418 |                 
419 |         if best_iou > 0:
420 |             detectors_mask[i, j, best_anchor] = 1
421 |             adjusted_box = np.array(
422 |                 [
423 |                     box[0] - j, box[1] - i,
424 |                     np.log(box[2] / anchors[best_anchor][0]),
425 |                     np.log(box[3] / anchors[best_anchor][1]), box_class
426 |                 ],
427 |                 dtype=np.float32)
428 |             matching_true_boxes[i, j, best_anchor] = adjusted_box
429 |     return detectors_mask, matching_true_boxes
430 | 


--------------------------------------------------------------------------------
/yad2k/utils/__init__.py:
--------------------------------------------------------------------------------
1 | from .utils import *
2 | 


--------------------------------------------------------------------------------
/yad2k/utils/__pycache__/__init__.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/miranthajayatilake/YOLOw-Keras/d1526fc6a64ccb163e26f1e2504389e565249dc8/yad2k/utils/__pycache__/__init__.cpython-35.pyc


--------------------------------------------------------------------------------
/yad2k/utils/__pycache__/utils.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/miranthajayatilake/YOLOw-Keras/d1526fc6a64ccb163e26f1e2504389e565249dc8/yad2k/utils/__pycache__/utils.cpython-35.pyc


--------------------------------------------------------------------------------
/yad2k/utils/utils.py:
--------------------------------------------------------------------------------
 1 | """Miscellaneous utility functions."""
 2 | 
 3 | from functools import reduce
 4 | 
 5 | 
 6 | def compose(*funcs):
 7 |     """Compose arbitrarily many functions, evaluated left to right.
 8 | 
 9 |     Reference: https://mathieularose.com/function-composition-in-python/
10 |     """
11 |     # return lambda x: reduce(lambda v, f: f(v), funcs, x)
12 |     if funcs:
13 |         return reduce(lambda f, g: lambda *a, **kw: g(f(*a, **kw)), funcs)
14 |     else:
15 |         raise ValueError('Composition of empty sequence not supported.')
16 | 


--------------------------------------------------------------------------------
/yolo.py:
--------------------------------------------------------------------------------
 1 | # import the needed modules
 2 | import os
 3 | from matplotlib.pyplot import imshow
 4 | import scipy.io
 5 | import scipy.misc
 6 | import numpy as np
 7 | from PIL import Image
 8 | 
 9 | from keras import backend as K
10 | from keras.models import load_model
11 | 
12 | # The below provided fucntions will be used from yolo_utils.py
13 | from yolo_utils import read_classes, read_anchors, generate_colors, preprocess_image, draw_boxes
14 | 
15 | # The below functions from the yad2k library will be used
16 | from yad2k.models.keras_yolo import yolo_head, yolo_eval
17 | 
18 | 
19 | #Provide the name of the image that you saved in the images folder to be fed through the network
20 | input_image_name = "test45.jpg"
21 | 
22 | #Obtaining the dimensions of the input image
23 | input_image = Image.open("images/" + input_image_name)
24 | width, height = input_image.size
25 | width = np.array(width, dtype=float)
26 | height = np.array(height, dtype=float)
27 | 
28 | #Assign the shape of the input image to image_shapr variable
29 | image_shape = (height, width)
30 | 
31 | 
32 | #Loading the classes and the anchor boxes that are provided in the madel_data folder
33 | class_names = read_classes("model_data/coco_classes.txt")
34 | anchors = read_anchors("model_data/yolo_anchors.txt")
35 | 
36 | #Load the pretrained model. Please refer the README file to get info on how to obtain the yolo.h5 file
37 | yolo_model = load_model("model_data/yolo.h5")
38 | 
39 | #Print the summery of the model
40 | yolo_model.summary()
41 | 
42 | #Convert final layer features to bounding box parameters
43 | yolo_outputs = yolo_head(yolo_model.output, anchors, len(class_names))
44 | 
45 | #Now yolo_eval function selects the best boxes using filtering and non-max suppression techniques.
46 | # If you want to dive in more to see how this works, refer keras_yolo.py file in yad2k/models
47 | boxes, scores, classes = yolo_eval(yolo_outputs, image_shape)
48 | 
49 | 
50 | # Initiate a session
51 | sess = K.get_session()
52 | 
53 | 
54 | #Preprocess the input image before feeding into the convolutional network
55 | image, image_data = preprocess_image("images/" + input_image_name, model_image_size = (608, 608))
56 | 
57 | #Run the session
58 | out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes],feed_dict={yolo_model.input:image_data,K.learning_phase(): 0})
59 | 
60 | 
61 | #Print the results
62 | print('Found {} boxes for {}'.format(len(out_boxes), input_image_name))
63 | #Produce the colors for the bounding boxs
64 | colors = generate_colors(class_names)
65 | #Draw the bounding boxes
66 | draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors)
67 | #Apply the predicted bounding boxes to the image and save it
68 | image.save(os.path.join("out", input_image_name), quality=90)
69 | output_image = scipy.misc.imread(os.path.join("out", input_image_name))
70 | imshow(output_image)
71 | 


--------------------------------------------------------------------------------
/yolo_utils.py:
--------------------------------------------------------------------------------
 1 | import colorsys
 2 | import imghdr
 3 | import os
 4 | import random
 5 | from keras import backend as K
 6 | 
 7 | import numpy as np
 8 | from PIL import Image, ImageDraw, ImageFont
 9 | 
10 | def read_classes(classes_path):
11 |     with open(classes_path) as f:
12 |         class_names = f.readlines()
13 |     class_names = [c.strip() for c in class_names]
14 |     return class_names
15 | 
16 | def read_anchors(anchors_path):
17 |     with open(anchors_path) as f:
18 |         anchors = f.readline()
19 |         anchors = [float(x) for x in anchors.split(',')]
20 |         anchors = np.array(anchors).reshape(-1, 2)
21 |     return anchors
22 | 
23 | def generate_colors(class_names):
24 |     hsv_tuples = [(x / len(class_names), 1., 1.) for x in range(len(class_names))]
25 |     colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
26 |     colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), colors))
27 |     random.seed(10101)  # Fixed seed for consistent colors across runs.
28 |     random.shuffle(colors)  # Shuffle colors to decorrelate adjacent classes.
29 |     random.seed(None)  # Reset seed to default.
30 |     return colors
31 | 
32 | def scale_boxes(boxes, image_shape):
33 |     """ Scales the predicted boxes in order to be drawable on the image"""
34 |     height = image_shape[0]
35 |     width = image_shape[1]
36 |     image_dims = K.stack([height, width, height, width])
37 |     image_dims = K.reshape(image_dims, [1, 4])
38 |     boxes = boxes * image_dims
39 |     return boxes
40 | 
41 | def preprocess_image(img_path, model_image_size):
42 |     image_type = imghdr.what(img_path)
43 |     image = Image.open(img_path)
44 |     resized_image = image.resize(tuple(reversed(model_image_size)), Image.BICUBIC)
45 |     image_data = np.array(resized_image, dtype='float32')
46 |     image_data /= 255.
47 |     image_data = np.expand_dims(image_data, 0)  # Add batch dimension.
48 |     return image, image_data
49 | 
50 | def draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors):
51 |     
52 |     font = ImageFont.truetype(font='font/FiraMono-Medium.otf',size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32'))
53 |     thickness = (image.size[0] + image.size[1]) // 300
54 | 
55 |     for i, c in reversed(list(enumerate(out_classes))):
56 |         predicted_class = class_names[c]
57 |         box = out_boxes[i]
58 |         score = out_scores[i]
59 | 
60 |         label = '{} {:.2f}'.format(predicted_class, score)
61 | 
62 |         draw = ImageDraw.Draw(image)
63 |         label_size = draw.textsize(label, font)
64 | 
65 |         top, left, bottom, right = box
66 |         top = max(0, np.floor(top + 0.5).astype('int32'))
67 |         left = max(0, np.floor(left + 0.5).astype('int32'))
68 |         bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32'))
69 |         right = min(image.size[0], np.floor(right + 0.5).astype('int32'))
70 |         print(label, (left, top), (right, bottom))
71 | 
72 |         if top - label_size[1] >= 0:
73 |             text_origin = np.array([left, top - label_size[1]])
74 |         else:
75 |             text_origin = np.array([left, top + 1])
76 | 
77 |         # My kingdom for a good redistributable image drawing library.
78 |         for i in range(thickness):
79 |             draw.rectangle([left + i, top + i, right - i, bottom - i], outline=colors[c])
80 |         draw.rectangle([tuple(text_origin), tuple(text_origin + label_size)], fill=colors[c])
81 |         draw.text(text_origin, label, fill=(0, 0, 0), font=font)
82 |         del draw


--------------------------------------------------------------------------------